“Within ten years, sequencing machines were fast enough that
researchers could seriously consider cataloging the sequence of
three billion nucleotides that make up the 30,00035,000 human
genes and creating a kind of blueprint of humanity. The Human
Genome Project was born. And with it came the task of making sense
of the three gigabytes of data that comprise our DNA. Add to that
all of the contextual information about the human genome —
published research about certain sequences and information on the
relationship between sequences — and the various algorithms for
analyzing the data, and finally the fact that the human genome is
merely one of many genomes to be mapped (the mouse genome is being
finished now) and you begin to have some very large and messy data
management problems. This is why, today, the computer has joined
the microscope and the rat’s cage as an essential part of the
biologist’s toolbox.“University research environments; vast amounts of data that
need to be manipulated in customizable ways; a community of
technical people with shared goals; new data analysis techniques
cropping up on a regular basis. These are the hallmarks of the open
source problem set, and if ever there was a world ready for open
source software, the biological sciences are it. In the last ten
years bioinformaticists (people who use computers to process
biological information) have wholeheartedly embraced open source
tools; in turn, the work done by biologists has begun to have an
impact in the larger open source world.“Established open source projects have proved particularly
useful to biologists in two areas: in number crunching, where
Linux-based Beowulf clusters are providing a high-performance and
inexpensive alternative to proprietary RISC systems, and in
scripting, where biologically-focused scripting libraries like
BioPerl and BioPython have become extremely popular tools for
writing quick queries to the numerous publicly available genomic
databases…”