全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Bio301: A Web-Based EST Annotation Pipeline That Facilitates Functional Comparison Studies

DOI: 10.5402/2012/139842

Full-Text   Cite this paper   Add to My Lib

Abstract:

In this postgenomic era, a huge volume of information derived from expressed sequence tags (ESTs) has been constructed for functional description of gene expression profiles. Comparative studies have become more and more important to researchers of biology. In order to facilitate these comparative studies, we have constructed a user-friendly EST annotation pipeline with comparison tools on an integrated EST service website, Bio301. Bio301 includes regular EST preprocessing, BLAST similarity search, gene ontology (GO) annotation, statistics reporting, a graphical GO browsing interface, and microarray probe selection tools. In addition, Bio301 is equipped with statistical library comparison functions using multiple EST libraries based on GO annotations for mining meaningful biological information. 1. Motivation Expressed sequence tags (ESTs) [1] are small pieces of DNA sequences (usually 200 to 500 nucleotides long) derived by either unidirectional or bidirectional sequencing of cDNA libraries. The information generated from ESTs has been utilized not only to identify novel gene transcripts, gene locations, and intron-exon boundaries in human and mouse genome drafts [2, 3] but also to assess gene expression levels of given tissues [4]. The large volume of information generated by the rapidly increasing number of ESTs—59 million EST entries in the dbEST in January 2009 alone—provides an excellent resource for comparative studies, so we have constructed an EST service website, Bio301, to facilitate comparative studies based on these EST data. Bio301 is equipped with not only an EST annotation pipeline but also functional comparative functionality. Bio301 has five characteristics considered to be essential for EST analysis tools that aid in functional comparative studies: accurate preprocessing, advanced functional annotation methods, flexibility in comparing multiple EST libraries, retrieval of EST data with respect to the annotation ontology, and integrated online EST service open to the entire research community. First, Bio301 preprocesses ESTs accurately by cleaning, clustering, and assembling them. These tasks are very important because accurate preprocessing leads to accurate functional annotation, which is crucial for functional comparison studies. Bio301 uses one of the best programs for sequence cleaning, SeqClean (http://compbio.dfci.harvard.edu/tgi/software/). Concordantly, Bio301 also uses state-of-the-art programs for clustering and assembly, TGICL and CAP3 [5, 6]. Since reference genomes with extensive genome annotation have been shown to be

References

[1]  M. D. Adams, J. M. Kelley, J. D. Gocayne et al., “Complementary DNA sequencing: expressed sequence tags and human genome project,” Science, vol. 252, no. 5013, pp. 1651–1656, 1991.
[2]  E. D. Neto, R. G. Correa, S. Verjovski-Almeida et al., “Shotgun sequencing of the human transcriptome with ORF expressed sequence tags,” Proceedings of the National Academy of Sciences of the United States of America, vol. 97, no. 7, pp. 3491–3496, 2000.
[3]  L. Hillier, G. Lennon, M. Becker et al., “Generation and analysis of 280,000 human expressed sequence tags,” Genome Research, vol. 6, no. 9, pp. 807–828, 1996.
[4]  T. G. Wolfsberg and D. Landsman, “Expressed sequence tags (ESTs),” in Bioinformatics, A. D. Bax-Evanis and B. F. F. Ouellette, Eds., pp. 283–301, John Wiley & Sons, New York, NY, USA, 2001.
[5]  G. Pertea, X. Huang, F. Liang et al., “TIGR gene indices clustering tools (TGICL): a software system for fast clustering of large EST datasets,” Bioinformatics, vol. 19, no. 5, pp. 651–652, 2003.
[6]  X. Huang and A. Madan, “CAP3: a DNA sequence assembly program,” Genome Research, vol. 9, no. 9, pp. 868–877, 1999.
[7]  B. Waegele, T. Schmidt, H. W. Mewes, and A. Ruepp, “OREST: the online resource for EST analysis,” Nucleic Acids Research, vol. 36, pp. W140–144, 2008.
[8]  N. Kim, S. Shin, and S. Lee, “ECgene: genome-based EST clustering and gene modeling for alternative splicing,” Genome Research, vol. 15, no. 4, pp. 566–576, 2005.
[9]  W. J. Kent, “BLAT—the BLAST-like alignment tool,” Genome Research, vol. 12, no. 4, pp. 656–664, 2002.
[10]  G. H. Lee, et al., “A novel tool for annotating protein domains in expressed sequence tags,” in Proceedings of the IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, D. Ashlock, Ed., pp. 422–427, Toronto, Canada, September 2006.
[11]  R. Apweiler, T. K. Attwood, A. Bairoch et al., “The InterPro database, an integrated documentation resource for protein families, domains and functional sites,” Nucleic Acids Research, vol. 29, no. 1, pp. 37–40, 2001.
[12]  W. J. Conover, Practical Nonparameteric Statistics, John Wiley & Sons, New York, NY, USA, 1999.
[13]  R. Schmid and M. L. Blaxter, “annot8r: GO, EC and KEGG annotation of EST datasets,” BMC Bioinformatics, vol. 9, article 180, 2008.
[14]  Z. Tang, J. H. Choi, C. Hemmerich, A. Sarangi, J. K. Colbourne, and Q. Dong, “ESTPiper—a web-based analysis pipeline for expressed sequence tags,” BMC Genomics, vol. 10, article 174, 2009.
[15]  J. Forment, F. Gilabert, A. Robles, V. Conejero, F. Nuez, and J. M. Blanca, “EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration,” BMC Bioinformatics, vol. 9, article 5, 2008.
[16]  Z. Chen, W. Wang, X. B. Ling, J. J. Liu, and L. Chen, “GO-Diff: mining functional differentiation between EST-based transcriptomes,” BMC Bioinformatics, vol. 7, article 72, 2006.
[17]  M. Montreuil and R. Jouvent, “Bibliometric modeling processes and the empirical validity of Lotka's law,” Journal of the American Society for Information Science, vol. 40, pp. 379–385, 1989.
[18]  P. P. Hwang and T. H. Lee, “New insights into fish ion regulation and mitochondrion-rich cells,” Comparative Biochemistry and Physiology—A Molecular and Integrative Physiology, vol. 148, no. 3, pp. 479–497, 2007.
[19]  H. Hagen-Larsen, J. K. Laerdahl, F. Panitz, A. Adzhubei, and B. H?yheim, “An EST-based approach for identifying genes expressed in the intestine and gills of pre-smolt Atlantic salmon (Salmo salar),” BMC Genomics, vol. 6, article 171, 2005.
[20]  S. Putta, J. J. Smith, J. A. Walker et al., “From biomedicine to natural history research: EST resources for ambystomatid salamanders,” BMC Genomics, vol. 5, article 54, 2004.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133