Recently, an independent testing laboratory found NT on a 4 way
Xeon processor machine to be faster than Linux on the same
hardware. While this tidbit is interesting for many reasons, not
the least of which is the questionability of the testing procedures
and the name calling and mud slinging that has ensued, most people
have missed the really important point, which is: Quad Xeon
processor boxes are a very weak value proposition for Web
Let's do a quick case study of the price / performance ratio of
a quad Xeon processor box versus a load balanced server farm
suitable for either SMB file serving or Web serving. The extra
details pertaining to network configuration will not be covered in
this document, only the servers themselves.
First, the prices for hardware I will be quoting will be from
May 6, 1999. I will use Dell as the standard supplier, although any
supplier will likely give similar prices for the type of machines
we are looking for.
Build a solid, reliable server for enterprise class use. Uptime
should be as high as possible, preferably 100%. Cost should be kept
low, but performance should not suffer simply to save money.
The server(s) should be able to provide 100Mbit/sec or better
throughput under load as either a file server or as a static
content HTML server. It should house 25 Gigabytes of online storage
in a RAID or redundant configuration.
While the quad Xeon machine can meet and exceed most of these
goals, it does so at tremendous expense. The cost of a machine of
this caliber runs from between $25,000 and $50,000, depending on
what options you choose.
That's a big chunk of change no matter how you look at it.
Let's compare that to a farm of Linux (or
FreeBSD/NetBSD/OpenBSD) boxes running under load balanced switches.
Since a web serving farm scales at a nearly linear rate, there is
no sense in buying the fastest machines made. What we want to shoot
for is the most bandwidth per buck in each unit. For a Linux farm,
dual CPU machines represent a fairly good trade off of price and
We basically want to maximize performance and minimize cost.
Since licensing isn't an issue with Linux or FreeBSD, and the
energy required for each machine is fairly low, we probably have to
worry more about shelf space than anything else.
From Dell, a Dual Pentium III 450 with 256 Megs of RAM, dual
100BastTX NICs, low end video card, and four 9 Gig Ultra Wide hard
drives sells for $4636.
Four of these machines will cost us $18,544 plus shipping. We'll
call it $20,000 for four machines. Note that we will have a total
of eight 450 MHz CPUs, with a total of 1 Gig of RAM between them.
Note that if you don't already have a load balancing switch to use,
you may need to buy another machine for $3,000 for a machine to
balance to the loads. While this machine will not need the large
RAID array of the of the farm servers, it will need plenty of
memory (hey, if you're building a balancer, you should put squid on
it too) and four or more NICs, and probably a FDDI card as well.
That will bring our bill to just under the cost of the lower end
Quad Xeon machines.
$25,000 (to $50,000)
# of CPUs:
4 x 500MHz (1 Meg cache)
8 x 450MHz (512k cache)
Aggregate network bandwidth
Drive storage (RAID 5)
27 gigs TOTAL hardware RAID
27 gigs software RAID / machine
Even at the lowest cost of $25,000, the Xeon machine is still
running half as many CPUs, and has fairly poor fault tolerance.
Price / Performance evaluation
While the quad Xeon machine may be as fast as the Linux farm, the
cost of the operating system ($799 more) and lack of redundancy
make it a weak value proposition. Also, its poor options for
upgrades make it a dead end system.
Failure Mode Analysis
We now look at the common types of failures for servers, and
speculate about how these failures would affect each type of web
Effect on quad Xeon
Effect on Farm
Hardware failure, soft
System slows down as a whole.
Depending on the failure you may have to take down your server
and replace the failed component (i.e. NIC failure, single hard
drive failure, single power supply failure.)
Performance loss could be anywhere 25% if a NIC fails to 70% or
more if a hard drive fails.
System slows down as a whole
One of the four servers may be significantly slowed down, and
may need to be taken down for a few hours to be fixed. However, the
rest of the farm stays up.
A single machine failing may result in 25% performance loss
maximum. Other minor failures (one NIC failing, one hard drive
failing) could result in 8 to 12% performance degradation.
Hardware failure, hard
You must repair the system to bring it back online. At best you
may be able to restore partial operability after removing the
failed component and restarting without it.
Performance loss is 100% until the server is fixed.
System slows by 25%
You must repair the one bad server and bring it back online. If
a component is not available, but the server can still operate in a
degraded manner, you may be able to reinsert it into the farm until
you can get the component. Note that high end workstations could be
tasked to take over this job until the parts arrive.
System stays up in a slightly degraded manner.
Software failure, soft
System stops temporarily. Users must wait while Server or a
25% performance degradation. System back at 100% when the
failed server is restarted.
Software failure, hard
System stops. System Administrator must be called on to bring
system back up. Depending on damage, this may take several
25% performance degradation. System administrator must be
called on to bring system back up. Depending on damage, this may
take several hours.
Hardware, soft: Single item fails, but does not shut down
server. System can be reconfigured on the fly to overcome these
problems. Example: Failed NIC or hard drive in a RAID
Hardware, hard: Single or multiple items fail. Results in
server shutdown of the unit affected. Example: CPU locks up memory
bus, server catches on fire, power spike kills both power supplies
in the Xeon. etc...
Software, soft: Server or a service on it crashes. Must reboot
server or restart service.
Software, hard: Server or service software becomes heavily
corrupted. Requires OS and / or service to be completely
reinstalled and tested before being placed back online.
So, imagine you've got one of these two setups running, and they
are working fine. Average CPU loads are sitting at below 50%, and
the customers are happy. Then, you get a large contract or you
start to advertise. Suddenly, your CPUs are averaging 80% during
the day, peaking at 100%, and you are getting error_logs full of
messages about time-outs.
It's time to upgrade. How do you upgrade the quad Xeon machine?
You don't, really. You replace it. At $25,000 it's hard to convince
to boss to replace a server that's less then one year old, when the
new server will only buy you another 6 months at best.
With the farm, you just buy new machines as you need them. And,
as faster machines come in, you can add them to a cluster one at a
time as you need them. Since they are standard dual processor
workstations, they can arrive much faster as well, days instead of
weeks. As long as you don't outrun your network connection speed,
you can keep increasing your farm size as needed.
Well, that's my take on it, what's yours?
Marlowe is from Jacksonville, Florida and currently resides in
Englewood, Colorado. Scott is a curricululm developer, Web author,
and system engineer at a medium sized Internet oriented company. He
is the father of two, an amateur saxophonist and a Linux