|
GigaScience 2012
The future of DNA sequence archivingKeywords: DNA, Sequence, Archive, Compression, Storage, Image Abstract: The vast majority of living organisms utilise nucleic acid as their primary store of genetic information. The technology to sequence DNA routinely was developed in the 1970s, but advances over time have since reduced cost and increased output. As the cost of sequencing has fallen, the number of species for which partial or complete genetic information has been derived has risen at a corresponding pace; starting with the first complete sequence of the Phi X 174 virus [1] in 1977, the first complete bacterial genome, that of Haemophilus influenzae[2], in 1995 and followed by genomes of hundreds of other organisms, including eukaryotes such as humans. Currently the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org/ webcite) databases hold complete genomes from 5,682 organisms and sequence from almost 700,000 organisms.The intracellular enzymatic processes that manipulate DNA molecules are highly formulaic: this has allowed the development of sophisticated, flexible, and ever cheaper laboratory techniques in which DNA and RNA can be cut, ligated, interconverted and replicated in vitro. Coupled with the decreasing cost of sequencing, DNA has become a convenient readout for a variety of molecular biology assays. This started with the development of EST and cDNA technologies, was followed by high-throughput genome sequencing and then progressed through routine large-scale transcriptome sequencing, and finally to yet more intensive processes such as RNA-seq, Chip-seq and DNaseI-seq. We have even witnessed the development of DNA sequencing-based methods with no direct biological role, such as the mathematical exploration of a combinatoric space and the development of unique synthetic tags for property tracking.DNA sequences determined for research purposes have been routinely archived since 1982, when the EMBL Data Library was founded. This was closely followed by the formation of GenBank first at the US Department of Energy and then trans
|