Rebooting is for Windows

By apexwm, 22 July, 2010 17:46

In the world of datacenters and servers, minimizing the downtime of the servers is crucial. Downtime can be very costly. First, ramping up servers ahead of time to minimize the downtime is the first step. This can involve setting up redundant servers or server clusters, which can incur extra hardware costs along the way. Second, when downtime happens, this can keep users from accessing the servers and can cause a snowball effect as services are not available and business cannot be carried on. This can be just as costly if not more costly than the hardware itself.

So let's look at two of the most common operating systems used today used in datacenters and on server systems. On one hand, Windows and the other Linux.

Windows by nature has more downtime per system, because Microsoft releases patches that require frequent rebooting. Windows patches are scheduled to be released on the second Tuesday of each month, so at a minimum once per month Windows systems will need to reboot. Sometimes, patches are released even more frequently, depending on the severity. Windows just can't activate a majority of software updates without rebooting the entire system.

And now let's take a look at Linux. By nature, Linux is designed to run indefinitely unless the running kernel itself needs to be upgraded. However there have been breakthroughs within the past year (Ksplice is one) where even the kernel itself can be upgraded without a reboot. This is just plain smart. First, the way that the Linux kernel is designed allows it to efficiently run and changes can be made to it while it is running. As long as the kernel is running, other changes to the operating system can be made as well. Sure, sometimes services need to be stopped/started (if they are upgraded), however this does not have nearly the impact as rebooting the entire system. Some services can also be updated without restarting the services themselves (as in a configuration change; the "service {service name} reload" command usually does the trick).

Rebooting causes downtime if the servers are not redundant, which is just plain bad practice unless the business does not operate 24x7. But, rebooting often causes other problems to pop up. Sometimes the system will not boot back up properly (which does not happen like it used to in older versions of Windows), which requires the systems admin to monitor the server each time it reboots. This requires extra time to monitor the systems. Even worse, rebooting normally happens during the night so it requires systems admins to be awake during the time that the reboots occur to ensure everything is successful.

What this means is that a small business can have a server providing something like file sharing services for example, and have one single Linux server or two Windows servers to provide the same amount of planned uptime. Having two redundant servers significantly increases the hardware cost, which is more than double in most cases. Not only do two servers need to be purchased but they must be configured with shared storage and configured to load balance themselves or provide failover. The advantage of having two Windows servers however is to guard against major hardware failure (i.e. motherboard), which would cause the single Linux server to become unavailable. But for example in this case, I'm only focusing on planned downtime.

I would have to assume that Microsoft may someday address the issue of their Windows operating system not allowing updates while it continues to run. Maybe. I've never been able to pinpoint exactly why Microsoft does not focus on making the kernel in Windows more efficient. Linux has been able to do this since it was first created in the early 1990's, and Unix was doing the same even before that. In the meantime, businesses should do their research to ensure whatever operating system they choose will work with their budget and business requirements, and uptime requirements.

 

Talkback

Very interesting. Let me add some specific numbers to the information you outlined. I have been working with Linux, Solaris, and DEC/Compaq/HP Unix (Tru64) systems for a very long time. In my personal experience, it is not at all unusual for one of these servers to be running without reboot (or crash) for over 1,000 days. That's three years or more. As you pointed out, it is unusual for a Windows server to survive more than a month or two without a mandatory reboot due to "critical" patches from Microsoft, or a plain old crash. Ugh.
J.A. Watson 23 July, 2010 13:10
Report offensive content Reply

I've long wondered how Windows servers are managed with all this rebooting after regular updates and what the real cost of this comes to in man hours, redundant equipment etc., on top of expensive and wasteful licencing terms and costly upgrades, not mention disruptive service packs.
Moley 23 July, 2010 14:23
Report offensive content Reply

Hi Jamie, thanks for sharing. I can see how those systems can easily run for that long. Since I started using Linux back in the late 1990's, the longest I've seen Linux servers run is about 400 days. Eventually they needed to be rebooted because of a power outage and we had no backup generator to keep them running. Overall, Linux servers I've administered need to be rebooted because of a power or hardware issue. I can only recall a couple of times where I had to reboot a machine running Linux because it actually crashed (which is almost unheard of), whereas with Windows I've long lost track because there have been so many over the years. I'm sure the number is in the HUNDREDS. In fact, a while back I was helping a company that actually installed Windows updates manually on their servers, that way they would keep them running for as long as possible and then install the updates and reboot at a pre-planned time. However, what we found was that the servers actually became unstable on their own after running for 3 months or more, and needed to be rebooted anyway. Since then, they've gone back to once per month rebooting and the servers actually run better that way because they are rebooted more frequently. To me, this just points out that Windows is still seriously flawed.

Another interesting thing to do is see who is running flavors of Unix/Linux on sites like the Netcraft uptime website:

http://uptime.netcraft.com/

and compare to those using Windows. I've found that those using Windows show frequent reboots. Those using Unix/Linux show infrequent reboots, and uptimes of hundreds of days.
apexwm 23 July, 2010 14:34
Edit Delete Report offensive content Reply

Moley : I haven't had the time to analyze that cost in detail, but I can tell you that it is very costly if somebody were to actually calculate it out. What many companies fail to see is all of the hidden costs that are associated (employee time and non-productivity of downtime being probably the most costly). It's difficult to get numbers for these items as it requires tracking and good reporting capabilities, which Microsoft is probably happy about.
apexwm 23 July, 2010 14:41
Edit Delete Report offensive content Reply

Windows needs to reboot so that it can update the registry...

Could there ever be a Windows with no registry?
roger andre 24 July, 2010 01:17
Report offensive content Reply

"Could there ever be a Windows with no registry?"

No, countless thousands of third party applications rely on and modify the registry.

Windows was never designed to be used as anything other than a stand-alone pc operating system. In the early 1990s MS decided to muscle in on Novells' dominance of the pc server market, so they blu-tacked some file sharing code onto WindowsNT, wrote 'server' on the box with a crayon, disabled some of the desktop features to make it look a bit different & multiplied the price by ten. The rest is history.
AndyPagin 26 July, 2010 11:35
Report offensive content Reply

Actually, a Windows without a registry (at least, one the way it is implemented now) is possible - all access to it is done through a set of APIs that's fixed. All you have to do is to replace the persistence implementation with something more... civilized.
rbanffy 27 July, 2010 22:26
Report offensive content Reply

See Windows 2008 Server Core - how to install Windows without a GUI. It addresses some of the issues raised here (but only some).

E.g. http://www.petri.co.il/understanding-windows-server-2008-core.htm
cowhamr 28 July, 2010 12:56
Report offensive content Reply

This article is little more than fanboys beating off in a closet. I have both Windows and Linux servers. I don't reboot my windows boxes nightly or even monthly because I don't necessarily need to install the latest patches. Most boxes have been running 400-500 days, and they dont f-ing crash. Its 2010 and we arent running NT4 or 2000. I can tell that the respondents so far are hands on techies because no one has mentioned the difference in caliber of staff required to run a windows farm vs a Linux farm. Our Linux guys are necessarily a LOT sharper and more flipping expensive. You guys talk about licensing costs... Stack our Linux staff against Windows staff WITH server 2003 licenses on VMs and there is no contest - Linux costs us more.
Isfeasachme 28 July, 2010 13:22
Report offensive content Reply

Windows had have it's APIs tangled in the past, the fisrt effort to effectively separate them started after the launch of Windows XP and only begun to show results with Windows 7, about the time Windows 7 launched to the market Microsoft completed the task to isolate the core of Windows in MinWin, although you can't run it or install it apart because Microsoft is strict on the compatibility policy and as they say, MinWin is pretty useless by it self. But for Windows 8 the plan is to allow installing and updating extensions to the Kernel much more like Linux packages, while running MinWin alone would replace the "safe" mode.

On MinWin:
http://www.betanews.com/article/Mark-Russinovich-on-MinWin-the-new-core-of-Windows/1259792850
http://www.zdnet.com/blog/microsoft/stripped-down-minwin-kernel-to-be-at-the-core-of-windows-7-and-more/842
http://arstechnica.com/microsoft/news/2009/11/inside-minwin-the-windows-7-kernel-slims-down.ars
http://en.wikipedia.org/wiki/MinWin

As mentioned by @rbanffy a windows without registry is possible, but Windows wasn't built, it grows, that is: Windows gets new features as needed, without thinking in isolation or abstraction.

"I would have to assume that Microsoft may someday address the issue of their Windows operating system not allowing updates while it continues to run. Maybe. I've never been able to pinpoint exactly why Microsoft does not focus on making the kernel in Windows more efficient."

I'll cite Mark Russinovich for you:
"If you look back at the evolution of Windows, it's evolved very organically, where components are added to the system and features are added to the system without, in the past, any real focus on architecture or layering."
Theraot 28 July, 2010 13:35
Report offensive content Reply

I understand Windows needs to reboot when certain DLLs are in-use in memory. Rebooting allows those to be replaced on the file system, without the read-only lock imposed by the file being in-use.

This is why not all Windows updates require a reboot, but those replacing critical system files invariably do.
jaymie 28 July, 2010 14:04
Report offensive content Reply

Theraot : The Mark Russinovich quote definitely sums it up as to why Windows is so bloated. However, my argument would be to look at the Linux kernel since it's early development in the 1990's, it's managed to evolve substantially and even today it is still fairly lean... and it is not nearly as bloated in comparison.

Isfeasachme : I am basing all of my conclusions based on personal experience with both operating systems. In my situation, it has become clear to me which operating system is superior over the other in over a decade of running both side by side, even with Windows Server 2003/2008. There are an infinite amount of posts all over that support Linux being a more stable OS than Windows. There are also those that strongly support Windows, but I often doubt the source of those articles as in a lot of cases they seem to be sponsored and misleading. Yes, I only posted negative experiences with Windows. I've seen Windows machines run for longer than a month, but consistently when they run longer, they degrade and become unstable. Linux doesn't. Everybody's network is different, however Windows patches are often more critical and not patching the box for an extended period of time in my opinion is asking for trouble, especially if the box is exposed to the Internet. If it's on a protected network, then the risk isn't as great. And for your point on Linux admins being more expensive, you have to keep in mind that Linux admins are not as common as those that know Windows, so by nature of supply & demand, they might be able to request a higher salary. As far as technical knowledge, an OS is an OS, and Linux by design is much less bloated than Windows, and in my opinion much easier to maintain and requires no babysitting like Windows. TCO of Linux is lower for a majority of cases, when you factor in ALL expenses. Not just hard numbers like salaries and licensing costs, but less obvious numbers like wasted employee time maintaining the machines, overtime, extra babysitting and monitoring, downtime, etc.

As to the other comments about no registry... in my opinion the registry is a single point of failure for Windows. If it becomes corrupted, so does Windows. Linux does not have a registry, and everything is simplified. Items that would be kept in the registry in Linux are often kept in plain text files. If one config file becomes corrupted, it usually doesn't affect the entire OS like the registry would.
apexwm 28 July, 2010 14:16
Edit Delete Report offensive content Reply

-> "Windows patches are scheduled to be released on the second Tuesday of each month, so at a minimum once per month Windows systems will need to reboot."

Hmm... Something tells me that you're using a combination of history and desktop client logic and applying it to the server space. Some counter points to your article:

1. Historically Windows required a reboot after patching *however* not all patches require reboots *and* a reboot is only necessary if a component being updated is in active use at the time the update is applied. I've applied MANY patches that didn't require a reboot.

2. Most IT shops with server farms avoid automatic updates like the plague because they want to validate that any given patch will not bring down the network.

3. Can't speak for all Linux distros, but I've done some updates on my Ubuntu box and it has required reboots even when nothing was in use but the shell.

4. The majority of "crashes" in Windows come from poorly written client applications or from poorly written drivers. I'm not saying Windows is perfect, but give credit where credit is due. It's because of this that the more recent Windows Server versions (2008 and 2008 R2) have things like the audio stack turned off and don't automatically install video card specific drivers. Instead they use generic drivers as a server should be just that. A server, not a desktop replacement.
PollyProteus 28 July, 2010 15:46
Report offensive content Reply

PollyProteus :

Yes it is true that some patches do not require a reboot, but a majority of them do. You only need one that needs to reboot, regardless of the others, that will flag the server that it needs to reboot and away it goes. A majority of admins keep current with MS updates and each month the servers need to reboot for at least one of the updates. I don't even recall a month where Microsoft released patches and the servers did NOT need to reboot. We've been applying all released critical patches on all servers, to make sure all of the bases are covered and completely up to date. You might call it paranoid, however staying on top of security is essential in an enterprise environment. One outbreak of a vulnerability and the next thing you know it's hours and hours of repair work and downtime.

Your point about validating patches is a good one. I highly recommend this as well. I admit it's been a long time since I've seen a MS patch corrupt something. Either way, a majority of them require a reboot so that is still an issue.

The only patches that Linux should need to reboot for are for kernel upgrades. A change to X11 or possibly the Gnome environment may require a log off / log on in order for it to start up the new binaries. But a complete reboot to take down all services is only necessary with a kernel upgrade, because the kernel is actively running the OS.

I agree that a lot of Windows' problems are introduced by 3rd party additions, like drivers. However, on the flip side, Linux does not commonly have this issue because you do not have competing 3rd parties writing drivers. Microsoft came up with the WHQL to try and funnel drivers through one source. However this process is not mandatory. With Linux, one driver per device is normally written per device, tested, approved, implemented into the kernel source, and it's done. This process is actually controller more strictly than Windows. Also, this increases the chance of problems being identified and squashed very quickly and efficiently.
apexwm 28 July, 2010 17:20
Edit Delete Report offensive content Reply

@Theraot: Do you have any citations for MSes plans with MinWin and Windows 8? I didn't see anything about future plans in the URLs you listed.
Danneely 28 July, 2010 21:34
Report offensive content Reply

re: the linux guys are sharper and cost more. In other words they are actually competent. Not applying patches to windows so don't have to reboot. That sounds like a good practice. Windows issues a fix and the techies (manager) decides we don't need it, lets keep our computer up and running, and thereby threatening everyones security and functionality, unless windows patches are really a waste of time, which doesn't say much for MS staff!
Windows is getting much better, it still is inefficient and wastes a lot of time, even on my PC the updates/fixes are a pain. Reboot then it has to finish updating. hmmm 15 minutes to completely update X 1,000,000,000 users is a lot of downtime.
andreaL 12 February, 2011 18:56
Report offensive content Reply