全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
BMC Genomics  2009 

The effect of sequencing errors on metagenomic gene prediction

DOI: 10.1186/1471-2164-10-520

Full-Text   Cite this paper   Add to My Lib

Abstract:

In this study, Sanger and pyrosequencing reads were simulated on the basis of models that take all types of sequencing errors into account. All metagenomic gene prediction tools showed decreasing accuracy with increasing sequencing error rates. Performance results on an established metagenomic benchmark dataset are also reported. In addition, we demonstrate that ESTScan, a tool for sequencing error compensation in eukaryotic expressed sequence tags, outperforms some metagenomic gene prediction tools on reads with high error rates although it was not designed for the task at hand.This study fills an important gap in metagenomic gene prediction research. Specialized methods are evaluated and compared with respect to sequencing error robustness. Results indicate that the integration of error-compensating methods into metagenomic gene prediction tools would be beneficial to improve metagenome annotation quality.Metagenomes are analyzed through simultaneous sequencing of all species in a microbial community without prior cultivation under laboratory conditions. The result is usually a large collection of sequencing reads from many species, and the phylogenetic origin of each read is unknown. A major goal in all metagenomic studies is the identification of potential protein functions and metabolic pathways. Reliable gene predictions are the basis for correct functional annotation, and for the discovery of new genes with their functions.Several gene prediction methods have been developed for the ab initio identification of protein coding genes in complete microbial genomes (e.g. GLIMMER and GeneMark [1,2]). These methods require an initial training phase on some data from the target genome, or training on the genome of a closely related species. Such conventional gene finders can in principle be applied to metagenomic data, given that single sequencing reads can be assembled into longer contigs in order to provide sufficient training data. The applicability of conventional

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133