OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

科学通报 2013

一种基于关联性特征的宏基因组测序片段分装方法

DOI: 10.1360/972012-993, PP. 2854-2860

丁啸,张倩倩,曹唱唱,孙啸

Keywords: 宏基因组,分装,关联性特征,机器学习

Full-Text Cite this paper Add to My Lib

Abstract:

20世纪末宏基因组学的概念被首次提出,从此打开了利用宏基因组学方法和技术研究微生物的大门.随着高通量测序技术的成熟,宏基因组学已经成为了一门新兴的热门学科.序列分析是宏基因组学研究的基础,而序列分析的一个重要环节就是测序片段的分装(binning).分装的准确性直接影响宏基因组学研究的精度和效率,提高分装准确性的关键在于提取出一种反映宏基因组测序片段物种分类的序列特征.目前主流分装方法利用的都是基因组序列的碱基组成性特征.本文深入研究序列的关联性特征,提出了一种基于关联性特征的分装方法,结合机器学习算法实现准确的分装,在对不同物种层次和不同复杂度的模拟宏基因组测序数据集进行分装时都能保持良好的性能.通过对比,发现此方法分装的正确率和稳定性都要优于目前国际上的无监督分装算法以及那些单纯使用三联、四联核苷酸出现频率进行分装的算法.

References

[1]	2 Angelov A, Mientus M, Liebl S, et al. A two-host fosmid system for functional screening of (meta) genomic libraries from extreme thermophiles. Syst Appl Microbiol, 2009, 32: 177-185
[2]	3 Huang L, Cagnon C, Caumette P, et al. First gene cassettes of integrons as targets in finding adaptive genes in metagenomes. Appl Environ Microbiol, 2009, 75: 3823-3825
[3]	4 Bekel T, Henckel K, Küster H, et al. The Sequence analysis and management system-sams-2.0: Data management and sequence analysis adapted to changing requirements from traditional sanger sequencing to ultrafast sequencing technologies. J Biotechnol, 2009, 140: 3-12
[4]	11 Kariin S, Burge C. Dinucleotide relative abundance extremes: A genomic signature. Trends Genet, 1995, 11: 283-290
[5]	12 Yang B, Peng Y, Leung H C M, et al. Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers. BMC Bioinformatics, 2010, 11: S5
[6]	13 Chan C K, Hsu A L, Tang S L, et al. Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing. J Biomed Biotechnol, 2007, 2008: 513701
[7]	14 Chan C K, Hsu A, Halgamuge S, et al. Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics, 2008, 9: 215
[8]	15 Liu Z, Meng J, Sun X. A novel feature-based method for whole genome phylogenetic analysis without alignment: Application to HEV genotyping and subtyping. Biochem Biophys Res Commun, 2008, 368: 223-230
[9]	16 Markowitz V M, Chen I M A, Palaniappan K, et al. IMG: The integrated microbial genomes database and comparative analysis system. Nucleic Acids Res, 2012, 40: D115-D122
[10]	1 Handelsman J, Rondon M R, Brady S F, et al. Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products. Chem Biol, 1998, 5: R245-R249
[11]	5 Mavromatis K, Ivanova N, Barry K, et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods, 2007, 4: 495-500
[12]	6 Huson D H, Auch A F, Qi J, et al. Megan analysis of metagenomic data. Genome Res, 2007, 17: 377-386
[13]	7 Seshadri R, Kravitz S A, Smarr L, et al. Camera: A community resource for metagenomics. PLoS Biol, 2007, 5: e75
[14]	8 Monzoorul H M, Ghosh T S, Komanduri D, et al. Sort-items: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences. Bioinformatics, 2009, 25: 1722-1730
[15]	9 Cole J, Chai B, Farris R, et al. The ribosomal database project (RDP-II): Sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res, 2005, 33: D294-D296
[16]	10 Karlin S, Ladunga I. Comparisons of eukaryotic genomic sequences. Proc Natl Acad Sci USA, 1994, 91: 12832-12836
[17]	17 De Hoon M J L, Imoto S, Nolan J, et al. Open source clustering software. Bioinformatics, 2004, 20: 1453-1454
[18]	18 Hsu C W, Lin C J. A simple decomposition method for support vector machines. Mach Learn, 2002, 46: 291-314

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133