Linux high-performance cluster monitoring with Ganglia
Mar 07, 2009, 20:03 (0 Talkback[s])
(Other stories by Vallard Benincosa)
[ Thanks to An Anonymous Reader for
this link. ]
"As data centers grow and administrative staffs shrink,
the need for efficient monitoring tools for compute resources is
more important than ever. The term monitor when applied to the data
center can be confusing since it means different things depending
on who is saying it and who is hearing it. For example:
"* The person running applications on the cluster thinks: "When
will my job run? When will it be done? And how is it performing
compared to last time?"
* The operator in the network operations center (NOC) thinks: "When
will we see a red light that means something needs to be fixed and
a service call placed?"
* The person in the systems engineering group thinks: "How are our
machines performing? Are all the services functioning correctly?
What trends do we see and how can we better utilize our compute
resources?"
"Somewhere in this frenzy of definitions you are bound to find
terabytes of code to monitor exactly what you want to monitor. And
it doesn't stop there; there are also myriads of products and
services. Fortunately though, many of the monitoring tools are open
source -- in fact, some of the open source tools do a better job
than some of the commercial applications that try to accomplish the
same thing."
Complete Story
Related Stories: