Linux Today: Linux News On Internet Time.

More on LinuxToday

Linux Magazine: Fault-Tolerant MPI

Feb 11, 2005, 05:30 (0 Talkback[s])
(Other stories by Graham E. Fagg)

"Today's users of high performance computing systems (HPC) have access to larger machines with more processors than ever before. Even discounting systems such as the Earth Simulator, the ASCI-Q machine, or IBM's Blue Gene system--all of which consist of thousands or even tens of thousand of processors--everyday production clusters can easily consist of hundreds to a few thousand processors. Future systems composed of a hundred thousand processors are already on the drawing board and are expected to be in service within the next few years.

"With such large systems, a critical issue is how to deal with hardware and software faults that lead to process failures..."

Complete Story

Related Stories: