By John Leyden, VNU Net
High-performance computer technology provided the engine behind
the mapping of the human genome, which has been described as one of
the greatest scientific discoveries of all time.
An international team of scientists, including a team from the
Welcome Trust’s Sanger Centre near Cambridge, said this week that
they decoded the three billion parts of our DNA make-up, raising
hopes for the development of treatments against inherited
disorders. But more research is still needed to reveal the
structure and function of genes.
The supercomputers used by researchers at Celera Genomics, the
Sanger Centre, and the Whitehead Institute included Compaq
AlphaServers running Tru64 Unix and TruCluster software.
The human genome project took nearly 10 years to complete and
was fraught with disagreements about ownership and access to data
before an agreement on working together was reached.
Between the three centres, a staggering amount of data and
computing power was used to decipher the 3.2 billion ‘base pairs’
that make up the genome.
To build a scalable and flexible infrastructure that would
support up to 450 users, scientists at the Sanger Centre decided
against using a Cray supercomputer, opting instead for a clustered
array of servers connected by a high-speed 155/622Mbps ATM
network.
Platform Computing’s resource management solution, LSF
Suite, was instrumental in managing and optimising more than 250
Alpha-based Compaq servers running Tru64 Unix, Linux-based x86, SGI
and Sun Microsystems systems, as well as the many software
resources in the ‘supercluster’ at the Sanger Centre that were used
to crack the DNA code.
“The raw computing power required to complete the project was
unprecedented,” said Phil Butcher, head of information technology
at the Sanger Centre. “Given the need to run jobs that could take
from a few minutes to many days to complete, we needed a cluster to
run continually without crashing or interrupting our workload. LSF
provided us with this.”
The LSF Suite enabled scientists to run all 250 Compaq systems
as a single ‘virtual’ computer. Using this collective processing
power with all systems running in tandem, researchers were able to
accomplish projects in a much shorter timeframe.
The Sanger Centre also employs a Compaq StorageWorks Raid system
with four terabytes of disk space, and a 300Gb Network Appliances
Raid sub-system.
While assembling base pairs in their correct order, Celera
deployed more than 600 Alpha processors from Compaq. These were
capable of nearly a trillion operations per second.
The final assembly computations were run on Compaq’s latest
AlphaServer GS160 because the algorithms and data required 64Gb of
shared memory to run successfully.