全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS Genetics  2015 

Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data

DOI: 10.1371/journal.pgen.1005271

Full-Text   Cite this paper   Add to My Lib

Abstract:

Sequencing family DNA samples provides an attractive alternative to population based designs to identify rare variants associated with human disease due to the enrichment of causal variants in pedigrees. Previous studies showed that genotype calling accuracy can be improved by modeling family relatedness compared to standard calling algorithms. Current family-based variant calling methods use sequencing data on single variants and ignore the identity-by-descent (IBD) sharing along the genome. In this study we describe a new computational framework to accurately estimate the IBD sharing from the sequencing data, and to utilize the inferred IBD among family members to jointly call genotypes in pedigrees. Through simulations and application to real data, we showed that IBD can be reliably estimated across the genome, even at very low coverage (e.g. 2X), and genotype accuracy can be dramatically improved. Moreover, the improvement is more pronounced for variants with low frequencies, especially at low to intermediate coverage (e.g. 10X to 20X), making our approach effective in studying rare variants in cost-effective whole genome sequencing in pedigrees. We hope that our tool is useful to the research community for identifying rare variants for human disease through family-based sequencing.

References

[1]  Ng SB, Bigham AW, Buckingham KJ, Hannibal MC, McMillin MJ, et al. (2010) Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet 42: 790–793. doi: 10.1038/ng.646. pmid:20711175
[2]  Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, et al. (2010) Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 42: 30–35. doi: 10.1038/ng.499. pmid:19915526
[3]  Need AC, Shashi V, Hitomi Y, Schoch K, Shianna KV, et al. (2012) Clinical application of exome sequencing in undiagnosed genetic conditions. J Med Genet 49: 353–361. doi: 10.1136/jmedgenet-2012-100819. pmid:22581936
[4]  Li B, Leal SM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83: 311–321. doi: 10.1016/j.ajhg.2008.06.024. pmid:18691683
[5]  Li B, Leal SM (2009) Discovery of rare variants via sequencing: implications for the design of complex trait association studies. PLoS Genet 5: e1000481. doi: 10.1371/journal.pgen.1000481. pmid:19436704
[6]  Lange LA, Hu Y, Zhang H, Xue C, Schmidt EM, et al. (2014) Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol. Am J Hum Genet 94: 233–245. doi: 10.1016/j.ajhg.2014.01.010. pmid:24507775
[7]  Liu L, Sabo A, Neale BM, Nagaswamy U, Stevens C, et al. (2013) Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls. PLoS Genet 9: e1003443. doi: 10.1371/journal.pgen.1003443. pmid:23593035
[8]  Purcell SM, Moran JL, Fromer M, Ruderfer D, Solovieff N, et al. (2014) A polygenic burden of rare disruptive mutations in schizophrenia. Nature. doi: 10.1038/nature12975
[9]  Timms AE, Dorschner MO, Wechsler J, Choi KY, Kirkwood R, et al. (2013) Support for the N-methyl-D-aspartate receptor hypofunction hypothesis of schizophrenia from exome sequencing in multiplex families. JAMA Psychiatry 70: 582–590. doi: 10.1001/jamapsychiatry.2013.1195. pmid:23553203
[10]  Cruchaga C, Karch CM, Jin SC, Benitez BA, Cai Y, et al. (2014) Rare coding variants in the phospholipase D3 gene confer risk for Alzheimer's disease. Nature 505: 550–554. doi: 10.1038/nature12825. pmid:24336208
[11]  Rosenthal EA, Ranchalis J, Crosslin DR, Burt A, Brunzell JD, et al. (2013) Joint linkage and association analysis with exome sequence data implicates SLC25A40 in hypertriglyceridemia. Am J Hum Genet 93: 1035–1045. doi: 10.1016/j.ajhg.2013.10.019. pmid:24268658
[12]  Li B, Chen W, Zhan X, Busonero F, Sanna S, et al. (2012) A likelihood-based framework for variant calling and de novo mutation detection in families. PLoS Genet 8: e1002944. doi: 10.1371/journal.pgen.1002944. pmid:23055937
[13]  Peng G, Fan Y, Palculict TB, Shen P, Ruteshouser EC, et al. (2013) Rare variant detection using family-based sequencing analysis. Proc Natl Acad Sci U S A 110: 3985–3990. doi: 10.1073/pnas.1222158110. pmid:23426633
[14]  Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81: 1084–1097. pmid:17924348 doi: 10.1086/521987
[15]  Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR (2011) Low-coverage sequencing: implications for design of complex trait association studies. Genome Res 21: 940–951. doi: 10.1101/gr.117259.110. pmid:21460063
[16]  O'Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, et al. (2014) A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet 10: e1004234. doi: 10.1371/journal.pgen.1004234. pmid:24743097
[17]  Danecek P, Auton A, Abecasis G, Albers CA, Banks E, et al. (2011) The variant call format and VCFtools. Bioinformatics 27: 2156–2158. doi: 10.1093/bioinformatics/btr330. pmid:21653522
[18]  Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18: 1851–1858. doi: 10.1101/gr.078212.108. pmid:18714091
[19]  DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43: 491–498. doi: 10.1038/ng.806. pmid:21478889
[20]  Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. doi: 10.1093/bioinformatics/btp352. pmid:19505943
[21]  McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20: 1297–1303. doi: 10.1101/gr.107524.110. pmid:20644199
[22]  Lander ES, Green P (1987) Construction of multilocus genetic linkage maps in humans. Proc Natl Acad Sci U S A 84: 2363–2367. pmid:3470801 doi: 10.1073/pnas.84.8.2363
[23]  Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77: 257–286. doi: 10.1109/5.18626
[24]  International HapMap C, Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, et al. (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467: 52–58. doi: 10.1038/nature09298. pmid:20811451
[25]  Genomes Project C, Abecasis GR, Auton A, Brooks LD, DePristo MA, et al. (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56–65. doi: 10.1038/nature11632. pmid:23128226
[26]  Hu Y, Willer C, Zhan X, Kang HM, Abecasis GR (2013) Accurate local-ancestry inference in exome-sequenced admixed individuals via off-target sequence reads. Am J Hum Genet 93: 891–899. doi: 10.1016/j.ajhg.2013.10.008. pmid:24210252
[27]  Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575. pmid:17701901 doi: 10.1086/519795
[28]  Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760. doi: 10.1093/bioinformatics/btp324. pmid:19451168
[29]  Mathieson I, McVean G (2012) Differential confounding of rare and common variants in spatially structured populations. Nat Genet 44: 243–246. doi: 10.1038/ng.1074. pmid:22306651
[30]  Chen W, Li B, Zeng Z, Sanna S, Sidore C, et al. (2013) Genotype calling and haplotyping in parent-offspring trios. Genome Res 23: 142–151. doi: 10.1101/gr.142455.112. pmid:23064751
[31]  Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58: 1347–1363. pmid:8651312
[32]  Kruglyak L, Lander ES (1998) Faster multipoint linkage analysis using Fourier transforms. J Comput Biol 5: 1–7. pmid:9541867 doi: 10.1089/cmb.1998.5.1
[33]  Cheung CY, Thompson EA, Wijsman EM (2013) GIGI: an approach to effective imputation of dense genotypes on large pedigrees. Am J Hum Genet 92: 504–516. doi: 10.1016/j.ajhg.2013.02.011. pmid:23561844
[34]  Tong L, Thompson E (2008) Multilocus lod scores in large pedigrees: combination of exact and approximate calculations. Hum Hered 65: 142–153. pmid:17934317 doi: 10.1159/000109731
[35]  Wijsman EM, Rothstein JH, Thompson EA (2006) Multipoint linkage analysis with many multiallelic or dense diallelic markers: Markov chain-Monte Carlo provides practical approaches for genome scans on general pedigrees. Am J Hum Genet 79: 846–858. pmid:17033961 doi: 10.1086/508472
[36]  Ramu A, Noordam MJ, Schwartz RS, Wuster A, Hurles ME, et al. (2013) DeNovoGear: de novo indel and point mutation discovery and phasing. Nat Methods 10: 985–987. doi: 10.1038/nmeth.2611. pmid:23975140

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133