OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Genome Biology 2009

De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data

DOI: 10.1186/gb-2009-10-9-r94

Scott DiGuistini, Nancy Y Liao, Darren Platt, Gordon Robertson, Michael Seidel, Simon K Chan, T Roderick Docking, Inanc Birol, Robert A Holt, Martin Hirst, Elaine Mardis, Marco A Marra, Richard C Hamelin, J？rg Bohlmann, Colette Breuil, Steven JM Jones

Full-Text Cite this paper Add to My Lib

Abstract:

The efficiency of de novo genome sequence assembly processes depends heavily on the length, fold-coverage and per-base accuracy of the sequence data. Despite substantial improvements in the quality, speed and cost of Sanger sequencing, generating a high quality draft de novo genome sequence for a eukaryotic genome remains expensive. New sequencing-by-synthesis systems from Roche (454), Illumina (Genome Analyzer) and ABI (SOLiD) offer greatly reduced per-base sequencing costs. While they are attractive for generating de novo sequence assemblies for eukaryotes, these technologies add several complicating factors: they generate short (typically 450 bp for 454; 50 to 100 bp for Illumina and SOLiD) reads that cannot resolve low complexity sequence regions or distributed repetitive elements; they have system-specific error models; and they can have higher base-calling error rates. To this point, then, de novo assemblies that use either 454 data alone, or that combine 454 with Sanger data in a 'hybrid' approach, have been reported only for prokaryote genomes, and no de novo assemblies that use Illumina reads, either alone or in combination with Sanger and 454 read data, have been reported for a eukaryotic genome.In principle, it should be possible to generate a de novo genome sequence for a eukaryotic genome by combining sequence information from different technologies. However, the new sequencing technologies are evolving rapidly, and no comprehensive bioinformatic system has been developed for optimizing such an approach. Such a system should flexibly integrate read data from different sequencing platforms while addressing sequencing depth, read quality and error models. Read quality and error models raise two challenges. First, while it is desirable to identify a subset of high quality reads prior to genome assembly, and established read quality scoring methods exist for Sanger sequence data, there are no rigorous equivalents for 454 or Illumina reads [1]. Second, error

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133