---

Linux Magazine: Fault-Tolerant MPI

“Today’s users of high performance computing systems (HPC) have
access to larger machines with more processors than ever before.
Even discounting systems such as the Earth Simulator, the ASCI-Q
machine, or IBM’s Blue Gene system–all of which consist of
thousands or even tens of thousand of processors–everyday
production clusters can easily consist of hundreds to a few
thousand processors. Future systems composed of a hundred thousand
processors are already on the drawing board and are expected to be
in service within the next few years.

“With such large systems, a critical issue is how to deal with
hardware and software faults that lead to process failures…”

Complete
Story

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends, & analysis