---

Stephen Adler — From Word Processors to Super Computers

“Donald Becker Speaks about Beowulf at NYLUG.”

“… a Beowulf system… is a cluster of many, off-the-shelf
PC’s, running Linux, and tied together through a high speed, low
latency networking infrastructure.”

“… being an open source project was crucial in making it as
reliable as it is.”

“Usually one of the nodes is set aside and dedicated to managing
the rest of the nodes in the cluster. It’s the job
distributor.”

“Most of his systems had been up for over a 100 days. I believe
some of the Beowulf clusters had been up for over 200 days. What is
important is not that a single machine has been up that long, but
that large numbers of machines have been up and running for that
amount of time.”

“If there is a bug found, then one can fix it by modifying a few
lines of code. That one module or program gets recompiled and your
off and running again, with a minimum amount of administrative
work. If one works with closed source systems, it is often the case
that when a similar small bug is found and fixed, a whole cascade
of software upgrades result. This is due to the fact that the bug
fix will come in the form of a new software release. This release
then upgrades your shared libraries. The shared library upgrades
then force you to upgrade all your applications and on and on.
After which you are then forced into revalidating your whole
cluster for production use.”

Complete
story
.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends, & analysis