Continuing advances in nucleotide sequencing technology are inspiring a suite of genomic approaches in studies of natural populations. Researchers are faced with data management and analytical scales that are increasing by orders of magnitude. With such dramatic advances comes a need to understand biases and error rates, which can be propagated and magnified in large-scale data acquisition and processing. Here we assess genomic sampling biases and the effects of various population-level data filtering strategies in a genotyping-by-sequencing (GBS) protocol. We focus on data from two species of Populus, because this genus has a relatively small genome and is emerging as a target for population genomic studies. We estimate the proportions and patterns of genomic sampling by examining the Populus trichocarpa genome (Nisqually-1), and demonstrate a pronounced bias towards coding regions when using the methylation-sensitive ApeKI restriction enzyme in this species. Using population-level data from a closely related species (P. tremuloides), we also investigate various approaches for filtering GBS data to retain high-depth, informative SNPs that can be used for population genetic analyses. We find a data filter that includes the designation of ambiguous alleles resulted in metrics of population structure and Hardy-Weinberg equilibrium that were most consistent with previous studies of the same populations based on other genetic markers. Analyses of the filtered data (27,910 SNPs) also resulted in patterns of heterozygosity and population structure similar to a previous study using microsatellites. Our application demonstrates that technically and analytically simple approaches can readily be developed for population genomics of natural populations.
References
[1]
L?pez Herráez DL, Bauchet M, Tang K, Theunert C, Pugach I, et al. (2009) Genetic variation and recent positive selection in worldwide human populations: evidence from nearly 1 Million SNPs. PLoS ONE 4: e7888. doi: 10.1371/journal.pone.0007888
[2]
Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, et al. (2000) An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407: 513–516. doi: 10.1038/35035083
[3]
van Orsouw NJ, Hogers RCJ, Janssen A, Yalcin F, Snoeijers S, et al. (2007) Complexity Reduction of Polymorphic Sequences (CRoPS (TM)): a novel approach for large-scale polymorphism discovery in complex genomes. PLoS ONE 2: e1172. doi: 10.1371/journal.pone.0001172
[4]
Van Tassell CP, Smith TPL, Matukumalli LK, Taylor JF, Schnabel RD, et al. (2008) SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nature Methods 5: 247–252. doi: 10.1038/nmeth.1185
[5]
Narum SR, Buerkle CA, Davey JW, Miller MR, Hohenlohe PA (2013) Genotyping-by-sequencing in ecological and conservation genomics. Molecular Ecology 22: 2841–2847. doi: 10.1111/mec.12350
[6]
Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, et al. (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3: e3376. doi: 10.1371/journal.pone.0003376
[7]
Poland JA, Rife TW (2012) Genotyping-by-Sequencing for Plant Breeding and Genetics. Plant Genome 5: 92–102. doi: 10.3835/plantgenome2012.05.0005
[8]
Gompert Z, Forister ML, Fordyce JA, Nice CC, Williamson RJ, et al. (2010) Bayesian analysis of molecular variance in pyrosequences quantifies population genetic structure across the genome of Lycaeides butterflies. Molecular Ecology 19: 2455–2473. doi: 10.1111/j.1365-294x.2010.04666.x
[9]
Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS ONE 7: e37135. doi: 10.1371/journal.pone.0037135
[10]
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, et al. (2011) A robust, simple Genotyping-by-Sequencing (GBS) approach for high diversity species. PLoS ONE 6: e19379. doi: 10.1371/journal.pone.0019379
[11]
Glenn TC (2011) Field guide to next-generation DNA sequencers. Molecular Ecology Resources 11: 759–769. doi: 10.1111/j.1755-0998.2011.03024.x
[12]
Lu F, Lipka AE, Glaubitz J, Elshire R, Cherney JH, et al.. (2013) Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol. PLOS Genetics 9..
[13]
Catchen J, Amores A, Hohenlohe P, Cresko W, Postlethwait J (2011) Stacks: building and genotyping loci de novo from short-read sequences. G3: Genes, Genomes, Genetics 1: 171–182. doi: 10.1534/g3.111.000240
[14]
Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, et al. (2010) Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genetics 6: e1000862. doi: 10.1371/journal.pgen.1000862
[15]
Nielsen R, Korneliussen T, Albrechtsen A, Li YR, Wang J (2012) SNP calling, genotype calling, and sample allele frequency estimation from new-generation sequencing data. PLoS ONE 7: e37558. doi: 10.1371/journal.pone.0037558
[16]
Arnold B, Corbett-Detig RB, Hartl D, Bomblies K (2013) RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Molecular Ecology 22: 3179–3190. doi: 10.1111/mec.12276
[17]
Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, et al. (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics 12: 499–510. doi: 10.1038/nrg3012
[18]
Gautier M, Gharbi K, Cezard T, Foucaud J, Kerdelhue C, et al. (2013) The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Molecular Ecology 22: 3165–3178. doi: 10.1111/mec.12089
[19]
Poland JA, Brown PJ, Sorrells ME, Jannink JL (2012) Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS ONE 7: e32253. doi: 10.1371/journal.pone.0032253
[20]
Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, et al. (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313: 1596–1604. doi: 10.1126/science.1128691
[21]
Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nature Reviews Genetics 12: 443–451. doi: 10.1038/nrg2986
[22]
Gore MA, Wright MH, Ersoz ES, Bouffard P, Szekeres ES, et al. (2009) Large-scale discovery of gene-enriched SNPs. Plant Gen 2: 121–133. doi: 10.3835/plantgenome2009.01.0002
[23]
Beissinger TM, Hirsch CN, Sekhon RS, Foerster JM, Johnson JM, et al. (2013) Marker density and read depth for genotyping populations using genotyping-by-sequencing. Genetics 193: 1073–1081. doi: 10.1534/genetics.112.147710
[24]
Buerkle CA, Gompert Z (2013) Population genomics based on low coverage sequencing: how low should we go? Molecular Ecology 22: 3028–3035. doi: 10.1111/mec.12105
[25]
Gompert Z, Buerkle CA (2012) bgc: Software for Bayesian estimation of genomic clines. Molecular Ecology Resources 12: 1168–1176. doi: 10.1111/1755-0998.12009.x
[26]
Parchman TL, Gompert Z, Mudge J, Schilkey FD, Benkman CW, et al. (2012) Genome-wide association genetics of an adaptive trait in lodgepole pine. Molecular Ecology 21: 2991–3005. doi: 10.1111/j.1365-294x.2012.05513.x
[27]
Rai HS, Mock KE, Richardson BA, Cronn RC, Hayden KJ, et al. (2013) Transcriptome characterization and detection of gene expression differences in aspen (Populus tremuloides). Tree Genetics & Genomes 9: 1031–1041. doi: 10.1007/s11295-013-0615-y
[28]
Little EL, Jr. (1971) Atlas of United States trees, volume 1, conifers and important hardwoods. Atlas of United States trees, Volume 1, conifers and important hardwoods: USDA Miscellaneous Publication 1146.
[29]
Callahan CM, Rowe CA, Ryel RJ, Shaw JD, Madritch MD, et al. (2013) Continental-scale assessment of genetic diversity and population structure in quaking aspen (Populus tremuloides). Journal of Biogeography 40: 1780–1791. doi: 10.1111/jbi.12115
[30]
Wigginton JE, Cutler DJ, Abecasis GR (2005) A note on exact tests of Hardy-Weinberg equilibrium. American Journal of Human Genetics 76: 887–893. doi: 10.1086/429864
[31]
Jombart T (2008) adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24: 1403–1405. doi: 10.1093/bioinformatics/btn129
[32]
Jombart T, Ahmed I (2011) adegenet 1.3-1: new tools for the analysis of genome-wide SNP data. Bioinformatics 27: 3070–3071. doi: 10.1093/bioinformatics/btr521
[33]
Goudet J (2005) HIERFSTAT, a package for R to compute and test hierarchical F-statistics. Molecular Ecology Notes 5: 184–186. doi: 10.1111/j.1471-8286.2004.00828.x
[34]
Evans LM, Allan GJ, Whitham TG (2012) Populus hybrid hosts drive divergence in the herbivorous mite, Aceria parapopuli: implications for conservation of plant hybrid zones as essential habitat. Conservation Genetics 13: 1601–1609. doi: 10.1007/s10592-012-0409-z
[35]
Olson MS, Levsen N, Soolanayakanahally RY, Guy RD, Schroeder WR, et al. (2013) The adaptive potential of Populus balsamifera L. to phenology requirements in a warmer global climate. Molecular Ecology 22: 1214–1230. doi: 10.1111/mec.12067
[36]
Keller SR, Levsen N, Ingvarsson PK, Olson MS, Tiffin P (2011) Local selection across a latitudinal gradient shapes nucleotide diversity in balsam poplar, Populus balsamifera L. Genetics 188: 941–U318. doi: 10.1534/genetics.111.128041
[37]
Keller SR, Levsen N, Olson MS, Tiffin P (2012) Local adaptation in the flowering-time gene network of balsam poplar, Populus balsamifera L. Molecular Biology and Evolution 29: 3143–3152. doi: 10.1093/molbev/mss121
[38]
Schweitzer JA, Madritch MD, Bailey JK, LeRoy CJ, Fischer DG, et al. (2008) From genes to ecosystems: The genetic basis of condensed tannins and their role in nutrient regulation in a Populus model system. Ecosystems 11: 1005–1020. doi: 10.1007/s10021-008-9173-9
[39]
DiFazio SP, Slavov GT, Joshi CP (2011) Populus: A premier pioneer system for plant genomics. In: joshi C, Difazio SP, Kole C, editors. Genetics, genomics and breeding of poplar. Enfield, NH: Science Publishers. pp. 1–28.
[40]
Perala DA (1990) Populus tremuloides Michx., Quaking Aspen. In: Burns RM, Honkala BH, editors. Silvics of North America, Volume 2, Hardwoods, USDA Forest Service, Agricultural Handbook 654. pp. 555–569.
[41]
Cheliak WM, Dancik BP (1982) Genic diversity of natural populations of a clone-forming tree Populus tremuloides. Canadian Journal of Genetics and Cytology 24: 611–616.
[42]
Sonah H, Bastien M, Iquira E, Tardivel A, Legare G, et al. (2013) An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping. PLoS ONE 8: e54603. doi: 10.1371/journal.pone.0054603
[43]
Phillips T (2008) The role of methylation in gene expression. Nature Education 1: 138.
[44]
Slavov G, Zhelev P (2010) Salient biological features, systematics, and genetic variation of Populus. In: Jansson S, Bhalerao R, Groover A, editors. Genetics and genomics of Populus. New York: Springer. pp. 15–38.
[45]
Jelinski DE, Cheliak WM (1992) Genetic diversity and spatial subdivision of Populus tremuloides (Salicaceae) in a heterogeneous landscape. American Journal of Botany 79: 728–736. doi: 10.2307/2444937
[46]
Mock KE, Rowe CA, Hooten MB, Dewoody J, Hipkins VD (2008) Clonal dynamics in western North American aspen (Populus tremuloides). Molecular Ecology 17: 4827–4844. doi: 10.1111/j.1365-294x.2008.03963.x
[47]
O'Reilly PT, Canino MF, Bailey KM, Bentzen P (2004) Inverse relationship between FST and microsatellite polymorphism in the marine fish, walleye pollock (Theragra chalcogramma): implications for resolving weak population structure. Molecular Ecology 13: 1799–1814. doi: 10.1111/j.1365-294x.2004.02214.x
[48]
Pompanon F, Bonin A, Bellemain E, Taberlet P (2005) Genotyping errors: Causes, consequences and solutions. Nature Reviews Genetics 6: 847–859. doi: 10.1038/nrg1707
[49]
Bonin A, Bellemain E, Eidesen PB, Pompanon F, Brochmann C, et al. (2004) How to track and assess genotyping errors in population genetics studies. Molecular Ecology 13: 3261–3273. doi: 10.1111/j.1365-294x.2004.02346.x
[50]
Meacham F, Boffelli D, Dhahbi J, Martin DIK, Singer M, et al.. (2011) Identification and correction of systematic error in high-throughput sequence data. BMC Bioinformatics 12..
[51]
Ellis JS, Gilbey J, Armstrong A, Balstad T, Cauwelier E, et al. (2011) Microsatellite standardization and evaluation of genotyping error in a large multi-partner research programme for conservation of Atlantic salmon (Salmo salar L.). Genetica 139: 353–367. doi: 10.1007/s10709-011-9554-4
[52]
Guichoux E, Lagache L, Wagner S, Chaumeil P, Leger P, et al. (2011) Current trends in microsatellite genotyping. Molecular Ecology Resources 11: 591–611. doi: 10.1111/j.1755-0998.2011.03014.x
[53]
Crawford LA, Koscinski D, Keyghobadi N (2012) A call for more transparent reporting of error rates: the quality of AFLP data in ecological and evolutionary research. Molecular Ecology 21: 5911–5917. doi: 10.1111/mec.12069
[54]
Wang Y, Lu J, Yu J, Gibbs RA, Yu F (2013) An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data. Genome research 23: 833–842. doi: 10.1101/gr.146084.112
[55]
van Poecke RMP, Maccaferri M, Tang J, Truong HT, Janssen A, et al. (2013) Sequence-based SNP genotyping in durum wheat. Plant Biotechnology Journal 11: 809–817. doi: 10.1111/pbi.12072