Pooled sequencing can be a cost-effective approach to disease variant discovery, but its applicability in association studies remains unclear. We compare sequence enrichment methods coupled to next-generation sequencing in non-indexed pools of 1, 2, 10, 20 and 50 individuals and assess their ability to discover variants and to estimate their allele frequencies. We find that pooled resequencing is most usefully applied as a variant discovery tool due to limitations in estimating allele frequency with high enough accuracy for association studies, and that in-solution hybrid-capture performs best among the enrichment methods examined regardless of pool size.
References
[1]
Eichler E, Flint J, Gibson G, Kong A, Leal S, et al. (2010) Missing heritability and strategies for _nding the underlying causes of complex disease. Nat Rev Genet 11: 446–450.
[2]
Manolio T, Collins F, Cox N, Goldstein D, Hindorff L, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753.
[3]
Johansen C, Wang J, Lanktree M, Cao H, McIntyre A, et al. (2010) Excess of rare variants in genes identified by genome-wide association study of hypertriglyceridemia. Nat Genet 42: 684–687.
[4]
Durbin R, Abecasis G, Altshuler D, Auton A, Brooks L, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073.
[5]
Nejentsev S, Walker N, Riches D, Egholm M, JA T (2009) Rare variants of ifih1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 324: 387–389.
[6]
V B (2010) A statistical method for the detection of variants from next-generation resequencing of dna pools. Bioinformatics 26: i318–324.
[7]
Calvo S, Tucker E, Compton A, Kirby D, Crawford G, et al. (2010) High-throughput, pooled sequencing identifies mutations in nubpl and foxred1 in human complex i deficiency. Nat Genet 42: 851–858.
[8]
Druley T, Vallania F, Wegner D, Varley K, Knowles O, et al. (2009) Quantification of rare allelic variants from pooled genomic dna. Nat Methods 6: 263–265.
[9]
Ingman M, Gyllensten U (2009) Snp frequency estimation using massively parallel sequencing of pooled dna. Eur J Hum Genet 17: 383–386.
[10]
Koboldt D, Chen K, Wylie T, Larson D, McLellan M, et al. (2009) Varscan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25: 2283–2285.
[11]
Out A, van Minderhout I, Goeman J, Ariyurek Y, Ossowski S, et al. (2009) Deep sequencing to reveal new variants in pooled dna samples. Hum Mutat 30: 1703–1712.
[12]
Vallania F, Druley T, Ramos E, Wang J, Borecki I, et al. (2010) High-throughput discovery of rare insertions and deletions in large cohorts. Genome Res 20: 1711–1718.
[13]
Albert T, Molla M, Muzny D, Nazareth L, Wheeler D, et al. (2007) Direct selection of human genomic loci by microarray hybridization. Nat Methods 4: 903–905.
[14]
Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust E, et al. (2009) Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol 27: 182–189.
[15]
Okou D, Steinberg K, Middle C, Cutler D, Albert T, et al. (2007) Microarray-based genomic selection for high-throughput resequencing. Nat Methods 4: 907–909.
[16]
Tewhey R, Nakano M, Wang X, Pabon-Pena C, Novak B, et al. (2009) Enrichment of sequencing targets from the human genome by solution hybridization. Genome Biol 10: R116.
[17]
Mamanova L, Coffey A, Scott C, Kozarewa I, Turner E, et al. (2010) Target-enrichment strategies for next-generation sequencing. Nat Methods 7: 111–118.
[18]
Teer J, Bonnycastle L, Chines P, Hansen N, Aoyama N, et al. (2010) Systematic comparison of three genomic enrichment methods for massively parallel dna sequencing. Genome Res 20: 1420–1431.
[19]
Lawrence R, Day-Williams A, Elliott K, Morris A, E Z (2010) CCRaVAT and QuTie – enabling analysis of rare variants in large-scale case control and quantitative trait association studies. BMC Bioinformatics 11: 527.
[20]
Li H, Ruan J, Durbin R (2008) Mapping short dna sequencing reads and calling variants using mapping quality scores. Genome Res 18: 1851–1858.
[21]
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. (2010) The Genome Analysis Toolkit: A mapreduce framework for analyzing next-generation dna sequencing data. Genome Res 20: 1297–1303.
[22]
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. (2009) The sequence alignment/map format and samtools. Bioinformatics 25: 2078–2079.
[23]
Smit A, Hubley R, Green P (1996–2010) Repeatmasker open-3.0. URL http://www.repeatmasker.org.
[24]
Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678.
[25]
Pruitt K, Harrow J, Harte R, Wallin C, Diekhans M, et al. (2009) The consensus coding sequence (ccds) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res 19: 1316–1323.
[26]
R Development Core Team (2010) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org. ISBN 3-900051-07-0.