Large-scale high-throughput sequencing techniques are rapidly becoming popular methods to profile complex communities and have generated deep insights into community biodiversity. However, several technical problems, especially sequencing artifacts such as nucleotide calling errors, could artificially inflate biodiversity estimates. Sequence filtering for artifact removal is a conventional method for deleting error-prone sequences from high-throughput sequencing data. As rare species represented by low-abundance sequences in datasets may be sensitive to artifact removal process, the influence of artifact removal on rare species recovery has not been well evaluated in natural complex communities. Here we employed both internal (reliable operational taxonomic units selected from communities themselves) and external (indicator species spiked into communities) references to evaluate the influence of artifact removal on rare species recovery using 454 pyrosequencing of complex plankton communities collected from both freshwater and marine habitats. Multiple analyses revealed three clear patterns: 1) rare species were eliminated during sequence filtering process at all tested filtering stringencies, 2) more rare taxa were eliminated as filtering stringencies increased, and 3) elimination of rare species intensified as biomass of a species in a community was reduced. Our results suggest that cautions be applied when processing high-throughput sequencing data, especially for rare taxa detection for conservation of species at risk and for rapid response programs targeting non-indigenous species. Establishment of both internal and external references proposed here provides a practical strategy to evaluate artifact removal process.
References
[1]
Sala OE, Chapin III FS, Armesto JJ, Berlow E, Bloomfield J, et al. (2000) Global biodiversity scenarios for the year 2100. Science 287: 1770–1774. doi: 10.1126/science.287.5459.1770
[2]
Hooper DU, Chapin III FS, Ewel JJ, Inchausti P, Lavorel S, et al. (2005) Effects of biodiversity on ecosystem functioning: a consensus of current knowledge. Ecol Monogr 75: 3–35. doi: 10.1890/04-0922
[3]
Tilman D (1999) Ecological consequences of biodiversity: a search for general principles. Ecology 80: 1455–1474. doi: 10.2307/176540
[4]
Chapin III FS, Zavaleta ES, Eviner VT, Naylor RL, Vitousek PE, et al. (2000) Consequences of changing biodiversity. Nature 405: 234–242. doi: 10.1038/35012241
[5]
Wardle DA, Bardgett RD, Callaway RM, Van der Putten WH (2011) Terrestrial ecosystem responses to species gains and losses. Science 332: 1273–1277. doi: 10.1126/science.1197479
[6]
Ji Y, Ashton L, Pedley SM, Edwards DP, Tang Y, et al. (2013) Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding. Ecol Lett 16: 1245–1257. doi: 10.1111/ele.12162
[7]
Amano T, Sutherland WJ (2013) Four barriers to the global understanding of biodiversity conservation: wealth, language, geographical location and security. P Roy Soc B - Biol Sci DOI:10.1098/rspb.2012.2649.
[8]
May RM (1988) How many species are there on Earth. Science 241: 1441–1449. doi: 10.1126/science.241.4872.1441
[9]
Blaxter M (2003) Molecular systematics - counting angels with DNA. Nature 421: 122–124. doi: 10.1038/421122a
[10]
Fonseca VG, Carvalho GR, Sung W, Johnson HF, Power DM, et al. (2010) Second-generation environmental sequencing unmasks marine metazoan biodiversity. Nat Commun 1: 98. doi: 10.1038/ncomms1095
[11]
Zhan A, Hulák M, Sylvester F, Huang X, Adebayo AA, et al. (2013) High sensitivity of 454 pyrosequencing for detection of rare species in aquatic communities. Meth Ecol Evol 4: 558–565. doi: 10.1111/2041-210x.12037
[12]
Jerde CL, Mahon AR, Chadderton WL, Lodge DM (2011) “Sight-unseen” detection of rare aquatic species using environmental DNA. Conserv Lett 4: 150–157. doi: 10.1111/j.1755-263x.2010.00158.x
[13]
Creer S (2010) Second-generation sequencing derived insights into the temporal biodiversity dynamics of freshwater protists. Mol Ecol 19: 2829–2831. doi: 10.1111/j.1365-294x.2010.04670.x
[14]
Gihring TM, Green SJ, Schadt CW (2012) Massively parallel rRNA gene sequencing exacerbates the potential for biased community diversity compariso-ns due to variable library. Environ Microbiol 14: 285–290. doi: 10.1111/j.1462-2920.2011.02550.x
[15]
Quince C, Curtis TP, Sloan WT (2008) The rational exploration of microbial diversity. ISME J 2: 997–1006. doi: 10.1038/ismej.2008.69
[16]
Kunin V, Engelbrektson A, Ochman H, Hugenholtz P (2010) Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12: 118–123. doi: 10.1111/j.1462-2920.2009.02051.x
[17]
Schloss PD, Gevers D, Westcott SL (2011) Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE 6: e27310. doi: 10.1371/journal.pone.0027310
[18]
Edgar RC (2013) UPARSE: Highly accurate OTU sequences from microbial amplicon reads. Nat Meth 10: 996–998. doi: 10.1038/nmeth.2604
[19]
Bowen De León K, Ramsay BD, Fields MW (2012) Quality-score refinement of SSU rRNA gene pyrosequencing differs across gene region for environmental samples. Microb Ecol 64: 499–508. doi: 10.1007/s00248-012-0043-9
[20]
Bokulich NA, Subramanian S, Faith JJ, Gevers D, Gordon JI, et al. (2013) Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing. Nat Meth 10: 57–59. doi: 10.1038/nmeth.2276
[21]
Zhan A, Bao Z, Hu X, Lu W, Wang S, et al. (2008) Accurate methods of DNA extraction and PCR-based genotyping for single scallop embryos/larvae long-preserved in ethanol. Mol Ecol Resour 8: 790–795. doi: 10.1111/j.1755-0998.2007.02066.x
[22]
Zhan A, Bailey SA, Heath DD, MacIsaac HJ (2014) Performance comparison of genetic markers for high-throughput sequencing-based biodiversity assessment in complex communities. Mol Ecol Resour doi:–10.1111/1755–0998.12254.
[23]
Parameswaran P, Jalili R, Tao L, Shokralla S, Gharizadeh B, et al. (2007) A pyrosequencing-tailored nucleotide barcode design unveils opportunities for large-scale sample multiplexing. Nucleic Acids Res 35: e130. doi: 10.1093/nar/gkm760
[24]
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, et al. (2009) Introducing mother: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75: 7537–7541. doi: 10.1128/aem.01541-09
[25]
Kauserud H, Kumar S, Brysting AK, Norden J, Carlsen T (2012) High consistency between replicate 454 pyrosequencing analyses of ectomycorrhizal plant root samples. Mycorrhiza 22: 309–315. doi: 10.1007/s00572-011-0403-1
[26]
Tedersoo L, Nilsson RH, Abarenkov K, Jairus T, Sadam A, et al. (2010) 454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases. New Phytol 188: 291–301. doi: 10.1111/j.1469-8137.2010.03373.x