全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2012 

Limitations of the Human Reference Genome for Personalized Genomics

DOI: 10.1371/journal.pone.0040294

Full-Text   Cite this paper   Add to My Lib

Abstract:

Data from the 1000 genomes project (1KGP) and Complete Genomics (CG) have dramatically increased the numbers of known genetic variants and challenge several assumptions about the reference genome and its uses in both clinical and research settings. Specifically, 34% of published array-based GWAS studies for a variety of diseases utilize probes that overlap unanticipated single nucleotide polymorphisms (SNPs), indels, or structural variants. Linkage disequilibrium (LD) block length depends on the numbers of markers used, and the mean LD block size decreases from 16 kb to 7 kb,when HapMap-based calculations are compared to blocks computed from1KGP data. Additionally, when 1KGP and CG variants are compared, 19% of the single nucleotide variants (SNVs) reported from common genomes are unique to one dataset; likely a result of differences in data collection methodology, alignment of reads to the reference genome, and variant-calling algorithms. Together these observations indicate that current research resources and informatics methods do not adequately account for the high level of variation that already exists in the human population and significant efforts are needed to create resources that can accurately assess personal genomics for health, disease, and predict treatment outcomes.

References

[1]  MacArthur DG, Tyler-Smith C (2010) Loss-of-function variants in the genomes of healthy humans. Human molecular genetics 19: R125.
[2]  Li R, Li Y, Zheng H, Luo R, Zhu H, et al. (2009) Building the sequence map of the human pan-genome. Nature biotechnology 28: 57–63.
[3]  Kidd JM, Sampas N, Antonacci F, Graves T, Fulton R, et al. (2010) Characterization of missing human genome sequences and copy-number polymorphic insertions. Nature methods 7: 365–371.
[4]  Durbin RM, Altshuler DL, Abecasis GR, Bentley DR, Chakravarti A, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073.
[5]  Roach JC, Glusman G, Smit AFA, Huff CD, Hubley R, et al. (2010) Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science 328: 636–639.
[6]  Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic acids research 29: 308–311.
[7]  Koboldt DC, Miller RD, Kwok PY (2006) Distribution of human SNPs and its effect on high throughput genotyping. Human mutation 27: 249–254.
[8]  Hoffmann TJ, Kvale MN, Hesselson SE, Zhan Y, Aquino C, et al. (2011) Next generation genome-wide association tool: Design and coverage of a high-throughput European-optimized SNP array. Genomics 98: 79–89.
[9]  Risueno A, Fontanillo C, Dinger M, De Las Rivas J (2010) GATExplorer: Genomic and Transcriptomic Explorer; mapping expression probes to gene loci, transcripts, exons and ncRNAs. BMC Bioinformatics 11: 221.
[10]  Li R, Li Y, Fang X, Yang H, Wang J, et al. (2009) SNP detection for massively parallel whole-genome resequencing. Genome research 19: 1124–1132.
[11]  DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43: 491–498.
[12]  Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR (2011) Low-coverage sequencing: Implications for design of complex trait association studies. Genome research 21: 940–951.
[13]  Hüebner C, Petermann I, Browning BL, Shelling AN, Ferguson LR (2007) Triallelic single nucleotide polymorphisms and genotyping error in genetic epidemiology studies: MDR1 (ABCB1) G2677/T/A as an example. Cancer Epidemiology Biomarkers & Prevention 16: 1185.
[14]  Morita A, Nakayama T, Doba N, Hinohara S, Mizutani T, et al. (2007) Genotyping of triallelic SNPs using TaqMan? PCR. Molecular and Cellular Probes 21: 171–176.
[15]  Fuentes Fajardo KV, Adams D, Program NCS, Mason CE, Sincan M, et al (2012) Detecting false-positive signals in exome sequencing. Human mutation 33: 609–613.
[16]  Ramensky V, Bork P, Sunyaev S (2002) Human non synonymous SNPs: server and survey. Nucleic acids research 30: 3894–3900.
[17]  Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic acids research 31: 3812–3814.
[18]  Chan PA, Duraisamy S, Miller PJ, Newell JA, McBride C, et al. (2007) Interpreting missense variants: comparing computational methods in human disease genes CDKN2A, MLH1, MSH2, MECP2, and tyrosinase (TYR). Human mutation 28: 683–693.
[19]  Flanagan SE, Patch AM, Ellard S (2010) Using SIFT and PolyPhen to predict loss-of-function and gain-of-function mutations. Genetic Testing and Molecular Biomarkers 14: 533–537.
[20]  Westphal V, Schottst?dt C, Marquardt T, Freeze HH (2000) Analysis of multiple mutations in the hALG6 gene in a patient with congenital disorder of glycosylation Ic. Molecular Genetics and Metabolism 70: 219–223.
[21]  Ajmal M, Ahmed W, Akhtar N, Sadeque A, Khalid A, et al. (2011) A Novel Pathogenic Nonsense Triple-Nucleotide Mutation in the Low-Density Lipoprotein Receptor Gene and Its Clinical Correlation with Familial Hypercholesterolemia. Genetic Testing and Molecular Biomarkers 15: 601–606.
[22]  Rosenfeld JA, Malhotra AK, Lencz T (2010) Novel multi-nucleotide polymorphisms in the human genome characterized by whole genome and exome sequencing. Nucleic Acids Research 38: 6102–6111.
[23]  Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, et al. (2005) Complement factor H polymorphism in age-related macular degeneration. Science 308: 385–389.
[24]  Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS (2005) A genome-wide scalable SNP genotyping assay using microarray technology. Nat Genet 37: 549–554.
[25]  Olivier M (2003) A haplotype map of the human genome. Physiological genomics 13: 3.
[26]  Palmer LJ, Cardon LR (2005) Shaking the tree: mapping complex disease genes with linkage disequilibrium. The Lancet 366: 1223–1234.
[27]  Lutécia P, Marbin P, William R, Libia F, Mark G, et al. (2007) The BRCA1 Ashkenazi founder mutations occur on common haplotypes and are not highly correlated with anonymous single nucleotide polymorphisms likely to be used in genome-wide case-control association studies. BMC Genetics 8: 68.
[28]  Im K, Kirchhoff T, Wang X, Green T, Chow C, et al. (2011) Haplotype structure in Ashkenazi Jewish BRCA1 and BRCA2 mutation carriers. Human Genetics 130: 685–699.
[29]  Jones AV, Chase A, Silver RT, Oscier D, Zoi K, et al. (2009) JAK2 haplotype is a major risk factor for the development of myeloproliferative neoplasms. Nat Genet 41: 446–449.
[30]  Pardanani A, Lasho TL, Finke CM, Gangat N, Wolanskyj AP, et al. (2009) The JAK2 46/1 haplotype confers susceptibility to essential thrombocythemia regardless of JAK2V617F mutational status-clinical correlates in a study of 226 consecutive patients. Leukemia 24: 110–114.
[31]  Cox DG, Kraft P, Hankinson SE, Hunter DJ (2005) Haplotype analysis of common variants in the BRCA1 gene and risk of sporadic breast cancer. Breast Cancer Res 7: R171–R175.
[32]  Freedman ML, Penney KL, Stram DO, Riley S, McKean-Cowdin R, et al. (2005) A Haplotype-Based Case-Control Study of BRCA1 and Sporadic Breast Cancer Risk. Cancer Research 65: 7516–7522.
[33]  Lencz T, Lambert C, DeRosse P, Burdick KE, Morgan TV, et al. (2007) Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proceedings of the National Academy of Sciences 104: 19942–19947.
[34]  Tewhey R, Bansal V, Torkamani A, Topol EJ, Schork NJ (2011) The importance of phase information for human genomics. Nat Rev Genet 12: 215–223.
[35]  Lemos R, de Lima S, da Cunha J, Oliveira D, de Souza M, et al. (2012) Revising the M235T Polymorphism Position for the AGT Gene and Reporting a Modifying Variant in the Brazilian Population with Potential Cardiac and Neural Impact. Journal of Molecular Neuroscience. pp. 1–4.
[36]  Smith CC, Wang Q, Chin C-S, Salerno S, Damon LE, et al. (2012) Validation of ITD mutations in FLT3 as a therapeutic target in human acute myeloid leukaemia. Nature advance online publication. 10.1038/nature11016.
[37]  Barrett JC (2009) Haploview: Visualization and Analysis of SNP Genotype Data. Cold Spring Harbor Protocols 2009: pdb.ip71.
[38]  Xu Z, Kaplan NL, Taylor JA (2007) TAGster: efficient selection of LD tag SNPs in single or multiple populations. Bioinformatics 23: 3254–3255.
[39]  Aach J, Bulyk ML, Church GM, Comander J, Derti A, et al. (2001) Computational comparison of two draft sequences of the human genome. Nature 409: 856–859.
[40]  Church DM, Schneider VA, Graves T, Auger K, Cunningham F, et al. (2011) Modernizing Reference Genome Assemblies. PLoS Biol 9: e1001091.
[41]  Dewey FE, Chen R, Cordero SP, Ormond KE, Caleshu C, et al. (2011) Phased Whole-Genome Genetic Risk in a Family Quartet Using a Major Allele Reference Sequence. PLoS Genet 7: e1002280.
[42]  Balasubramanian S, Habegger L, Frankish A, MacArthur DG, Harte R, et al. (2011) Gene inactivation and its implications for annotation in the era of personal genomics. Genes & Development 25: 1–10.
[43]  Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G (2012) De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 44: 226–232.
[44]  Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, et al. (2011) AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol 7.
[45]  Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, et al. (2011) The UCSC Genome Browser database: update 2011. Nucleic Acids Research 39: D876.
[46]  The_Wellcome_Trust (2003) Sharing Data from Large-Scale Biological Research Projects: A System of Tripartite Responsibility. Wellcome Trust Fort Lauderdale, FL.
[47]  Tong MY, Cassa CA, Kohane IS (2011) Automated validation of genetic variants from large databases: ensuring that variant references refer to the same genomic locations. Bioinformatics 27: 891–893.
[48]  Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81: 559–575.
[49]  Howie B, Marchini J, Stephens M (2011) Genotype Imputation with Thousands of Genomes. G3: Genes, Genomes, Genetics 1: 457–470.
[50]  Barnett DW, Garrison EK, Quinlan AR, Str?mberg MP, Marth GT (2011) BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27: 1691–1692.
[51]  Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, et al. (2009) The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome research 19: 1316–1323.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133