全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2012 

Two-Stage Clustering (TSC): A Pipeline for Selecting Operational Taxonomic Units for the High-Throughput Sequencing of PCR Amplicons

DOI: 10.1371/journal.pone.0030230

Full-Text   Cite this paper   Add to My Lib

Abstract:

Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs) is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC) because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from ‘noise’ sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/.

References

[1]  Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, et al. (2009) A core gut microbiome in obese and lean twins. Nature 457: 480–484.
[2]  Larsen N, Vogensen FK, van den Berg FW, Nielsen DS, Andreasen AS, et al. (2010) Gut microbiota in human adults with type 2 diabetes differs from non-diabetic adults. PLoS ONE 5: e9085.
[3]  Zhang C, Zhang M, Wang S, Han R, Cao Y, et al. (2009) Interactions between gut microbiota, host genetics and diet relevant to development of metabolic syndromes in mice. ISME J 4: 232–241.
[4]  Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, et al. (2006) Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proc Natl Acad Sci USA 103: 12115–12120.
[5]  Galand PE, Casamayor EO, Kirchman DL, Lovejoy C (2009) Ecology of the rare microbial biosphere of the Arctic Ocean. Proc Natl Acad Sci U S A 106: 22427–22432.
[6]  Zhou HW, Li DF, Tam NF, Jiang XT, Zhang H, et al. (2011) BIPES, a cost-effective high-throughput method for assessing microbial diversity. ISME J 5: 741–749.
[7]  Gloor GB, Hummelen R, Macklaim JM, Dickson RJ, Fernandes AD, et al. (2010) Microbiome Profiling by Illumina Sequencing of Combinatorial Sequence-Tagged PCR Products. PLoS ONE 5: e15406.
[8]  Lazarevic V, Whiteson K, Huse S, Hernandez D, Farinelli L, et al. (2009) Metagenomic study of the oral microbiota by Illumina high-throughput sequencing. J Microbiol Methods 79: 266–271.
[9]  Claesson MJ, Wang Q, O'Sullivan O, Greene-Diniz R, Cole JR, et al. (2010) Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. Nucl Acids Res 38: e200.
[10]  Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, et al. (2011) Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci U S A 108: 4516–4522.
[11]  Schloss PD, Westcott SL (2011) Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rRNA Gene Sequence Analysis. Appl Environ Microbiol 77: 3219–3226.
[12]  Sun Y, Cai Y, Liu L, Yu F, Farrell ML, et al. (2009) ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences. Nucl Acids Res 37: e76.
[13]  Huse SM, Welch DM, Morrison HG, Sogin ML (2010) Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ Microbiol 12: 1889–1898.
[14]  Cai Y, Sun Y (2011) ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucl Acids Res 39: e95.
[15]  Sun Y, Cai Y, Huse SM, Knight R, Farmerie WG, et al. (2011) A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. Briefings in Bioinformatics. doi:10.1093/bib/bbr1009.
[16]  Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460–2461.
[17]  Quince C, Lanzen A, Davenport R, Turnbaugh P (2011) Removing Noise From Pyrosequenced Amplicons. BMC Bioinformatics 12: 38.
[18]  Kunin V, Engelbrektson A, Ochman H, Hugenholtz P (2010) Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12: 118–123.
[19]  Quince C, Lanzen A, Curtis TP, Davenport RJ, Hall N, et al. (2009) Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Meth 6: 639–641.
[20]  Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, et al. (2009) Bacterial community variation in human body habitats across space and time. Science 326: 1177486.
[21]  Reeder J, Knight R (2009) The ‘rare biosphere’: a reality check. Nat Methods 6: 636–637.
[22]  Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. doi:10.1093/bioinformatics/btr381.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133