Linux Today: Linux News On Internet Time.

VNU Net: Clustered servers used to map human genome

Jun 27, 2000, 18:01 (0 Talkback[s])
(Other stories by John Leyden)

By John Leyden, VNU Net

High-performance computer technology provided the engine behind the mapping of the human genome, which has been described as one of the greatest scientific discoveries of all time.

An international team of scientists, including a team from the Welcome Trust's Sanger Centre near Cambridge, said this week that they decoded the three billion parts of our DNA make-up, raising hopes for the development of treatments against inherited disorders. But more research is still needed to reveal the structure and function of genes.

The supercomputers used by researchers at Celera Genomics, the Sanger Centre, and the Whitehead Institute included Compaq AlphaServers running Tru64 Unix and TruCluster software.

The human genome project took nearly 10 years to complete and was fraught with disagreements about ownership and access to data before an agreement on working together was reached.

Between the three centres, a staggering amount of data and computing power was used to decipher the 3.2 billion 'base pairs' that make up the genome.

To build a scalable and flexible infrastructure that would support up to 450 users, scientists at the Sanger Centre decided against using a Cray supercomputer, opting instead for a clustered array of servers connected by a high-speed 155/622Mbps ATM network.

Platform Computing's resource management solution, LSF Suite, was instrumental in managing and optimising more than 250 Alpha-based Compaq servers running Tru64 Unix, Linux-based x86, SGI and Sun Microsystems systems, as well as the many software resources in the 'supercluster' at the Sanger Centre that were used to crack the DNA code.

"The raw computing power required to complete the project was unprecedented," said Phil Butcher, head of information technology at the Sanger Centre. "Given the need to run jobs that could take from a few minutes to many days to complete, we needed a cluster to run continually without crashing or interrupting our workload. LSF provided us with this."

The LSF Suite enabled scientists to run all 250 Compaq systems as a single 'virtual' computer. Using this collective processing power with all systems running in tandem, researchers were able to accomplish projects in a much shorter timeframe.

The Sanger Centre also employs a Compaq StorageWorks Raid system with four terabytes of disk space, and a 300Gb Network Appliances Raid sub-system.

While assembling base pairs in their correct order, Celera deployed more than 600 Alpha processors from Compaq. These were capable of nearly a trillion operations per second.

The final assembly computations were run on Compaq's latest AlphaServer GS160 because the algorithms and data required 64Gb of shared memory to run successfully.

Related Stories: