全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics

DOI: 10.1371/journal.pcbi.1004714

Full-Text   Cite this paper   Add to My Lib

Abstract:

Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries.

References

[1]  Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90: 7–24. doi: 10.1016/j.ajhg.2011.11.029. pmid:22243964
[2]  Hou L, Zhao H. A review of post-GWAS prioritization approaches. Front Genet. 2013;4: 280. doi: 10.3389/fgene.2013.00280. pmid:24367376
[3]  Segrè A V, Groop L, Mootha VK, Daly MJ, Altshuler D. Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet. 2010;6: e1001058. doi: 10.1371/journal.pgen.1001058. pmid:20714348
[4]  Pers TH, Karjalainen JM, Chan Y, Westra H-J, Wood AR, Yang J, et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.; 2015;6: 5890.
[5]  Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014;46: 1173–86. doi: 10.1038/ng.3097. pmid:25282103
[6]  Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25: 25–29. doi: 10.1038/75556. pmid:10802651
[7]  Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M. Data, information, knowledge and principle: Back to metabolism in KEGG. Nucleic Acids Res. 2014;42: D199–D205. doi: 10.1093/nar/gkt1076. pmid:24214961
[8]  Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, et al. PANTHER: A library of protein families and subfamilies indexed by function. Genome Res. 2003;13: 2129–2141. doi: 10.1101/gr.772403. pmid:12952881
[9]  Croft D, O’Kelly G, Wu G, Haw R, Gillespie M, Matthews L, et al. Reactome: A database of reactions, pathways and biological processes. Nucleic Acids Res. 2011;39: D691–D697. doi: 10.1093/nar/gkq1018. pmid:21067998
[10]  Nishimura D. BioCarta. Biotech Softw Internet Rep. 2001;2: 117–120. doi: 10.1089/152791601750294344.
[11]  Liu JZ, McRae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, et al. A versatile gene-based test for genome-wide association studies. Am J Hum Genet. The American Society of Human Genetics; 2010;87: 139–145. doi: 10.1016/j.ajhg.2010.06.009. pmid:20598278
[12]  Li MX, Gui HS, Kwan JSH, Sham PC. GATES: A rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet. 2011;88: 283–293. doi: 10.1016/j.ajhg.2011.01.019. pmid:21397060
[13]  Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011;21: 1109–21. doi: 10.1101/gr.118992.110. pmid:21536720
[14]  Wang L, Jia P, Wolfinger RD, Chen X, Grayson BL, Aune TM, et al. An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies. Bioinformatics. 2011;27: 686–692. doi: 10.1093/bioinformatics/btq728. pmid:21266443
[15]  Wang K, Li M, Bucan M. Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet. 2007;81: 1278–1283. doi: 10.1086/522374. pmid:17966091
[16]  Ehret GB, Lamparter D, Hoggart CJ, Whittaker JC, Beckmann JS, Kutalik Z. A multi-SNP locus-association method reveals a substantial fraction of the missing heritability. Am J Hum Genet. 2012;91: 863–871. doi: 10.1016/j.ajhg.2012.09.013. pmid:23122585
[17]  Yang J, Ferreira T, Morris AP, Medland SE, Madden PAF, Heath AC, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature Genetics. 2012. pp. 369–375. doi: 10.1038/ng.2213. pmid:22426310
[18]  Holmans P, Green EK, Pahwa JS, Ferreira M a R, Purcell SM, Sklar P, et al. Gene Ontology Analysis of GWA Study Data Sets Provides Insights into the Biology of Bipolar Disorder. Am J Hum Genet. 2009;85: 13–24. doi: 10.1016/j.ajhg.2009.05.011. pmid:19539887
[19]  Evangelou M, Smyth DJ, Fortune MD, Burren OS, Walker NM, Guo H, et al. A Method for Gene-Based Pathway Analysis Using Genomewide Association Study Summary Statistics Reveals Nine New Type 1 Diabetes Associations Genetic Epidemiology. Genet Epidemiol. 2014;38: 661–670. doi: 10.1002/gepi.21853. pmid:25371288
[20]  Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89: 82–93. doi: 10.1016/j.ajhg.2011.05.029. pmid:21737059
[21]  Conneely KN, Boehnke M. So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. Am J Hum Genet. 2007;81: 1158–1168. doi: 10.1086/522036. pmid:17966093
[22]  Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491: 56–65. doi: 10.1038/nature11632. pmid:23128226
[23]  Willer CJ, Schmidt EM, Sengupta S, Peloso GM, Gustafsson S, Kanoni S, et al. Discovery and refinement of loci associated with lipid levels. Nat Genet. 2013;45: 1274–83. doi: 10.1038/ng.2797. pmid:24097068
[24]  The International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467: 52–8. doi: 10.1038/nature09298. pmid:20811451
[25]  Okada Y, Wu D, Trynka G, Raj T, Terao C, Ikari K, et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature. 2014;506: 376–81. doi: 10.1038/nature12873. pmid:24390342
[26]  Mishra A, Macgregor S. VEGAS2?: Software for More Flexible Gene-Based Testing. Twin Res Hum Genet. 2015;18: 86–91. doi: 10.1017/thg.2014.79. pmid:25518859
[27]  Firmann M, Mayor V, Vidal P, Bochud M, Pécoud A, Hayoz D, et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovascular Disorders. 2008. p. 6. doi: 10.1186/1471-2261-8-6. pmid:18366642
[28]  Heinig M, Petretto E, Wallace C, Bottolo L, Rotival M, Lu H, et al. A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk. Nature. 2010;467: 460–464. doi: 10.1038/nature09386. pmid:20827270
[29]  Burren OS, Guo H, Wallace C. VSEAMS?: A pipeline for variant set enrichment analysis using summary GWAS data identifies IKZF3, BATF and ESRRA as key transcription factors in type 1 diabetes. 2014;30: 0–26. doi: 10.1093/bioinformatics/btu571.
[30]  Davis J, Goadrich M. The Relationship Between Precision-Recall and ROC Curves. Proc 23rd Int Conf Mach Learn—ICML’06. 2006; 233–240. doi: 10.1145/1143844.1143874.
[31]  Franke A, McGovern DPB, Barrett JC, Wang K, Radford-Smith GL, Ahmad T, et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat Genet. Nature Publishing Group; 2010;42: 1118–25. doi: 10.1038/ng.717. pmid:21102463
[32]  Imielinski M, Baldassano RN, Griffiths A, Russell RK, Annese V, Dubinsky M, et al. Common variants at five new loci associated with early-onset inflammatory bowel disease. Nat Genet. 2009;41: 1335–1340. doi: 10.1038/ng.489. pmid:19915574
[33]  Wellcome T, Case T, Consortium C. Genome-wide association study of 14, 000 cases of seven common diseases and. Nature. 2007;447: 661–78. doi: 10.1038/nature05911. pmid:17554300
[34]  Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, Jackson AU, et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet. 2010;42: 105–116. doi: 10.1038/ng.520. pmid:20081858
[35]  Estrada K, Styrkarsdottir U, Evangelou E, Hsu YH, Duncan EL, Ntzani EE, et al. Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. 2012;44: 491–501. doi: 10.1038/ng.2249.
[36]  Day TF, Yang Y. Wnt and hedgehog signaling pathways in bone development. J Bone Joint Surg Am. 2008;90 Suppl 1: 19–24. doi: 10.2106/JBJS.G.01174. pmid:18292352
[37]  Tobacco T, Consortium G. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat Genet. 2010;42: 441–7. doi: 10.1038/ng.571. pmid:20418890
[38]  Bradley DT, Zipfel PF, Hughes AE. Complement in age-related macular degeneration: a focus on function. Eye (Lond). 2011;25: 683–693. doi: 10.1038/eye.2011.37.
[39]  Ebrahimi KB, Handa JT. Lipids, lipoproteins, and age-related macular degeneration. J Lipids. 2011;2011: 802059. doi: 10.1155/2011/802059. pmid:21822496
[40]  Lee D, Williamson VS, Bigdeli TB, Riley BP, Fanous a. H, Vladimirov VI, et al. JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. Bioinformatics. 2014;31: 1176–1182. doi: 10.1093/bioinformatics/btu816. pmid:25505091
[41]  Genz A. Numerical Computation of Multivariate Normal Probabilities. J Comput Graph Stat. 1992;1: 141–149. doi: 10.1080/10618600.1992.10477010.
[42]  Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research. 1999. pp. 29–34. doi: 10.1093/nar/27.1.29. pmid:9847135
[43]  Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette M a, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102: 15545–50. doi: 10.1073/pnas.0506580102. pmid:16199517
[44]  Xu Z, Duan Q, Yan S, Chen W, Li M, Lange E, et al. DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics. 2015;31: 2434–2442. doi: 10.1093/bioinformatics/btv168. pmid:25810429
[45]  Ehret GB, Lamparter D, Hoggart CJ, Whittaker JC, Beckmann JS, Kutalik Z. A multi-SNP locus-association method reveals a substantial fraction of the missing heritability. Am J Hum Genet. 2012;91: 863–871. doi: 10.1016/j.ajhg.2012.09.013. pmid:23122585
[46]  Gao X, Starmer J, Martin ER. A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol. 2008;32: 361–369. doi: 10.1002/gepi.20310. pmid:18271029
[47]  B DR. The Distribution of a Linear Combination of x2 Random Variables. J R Stat Soc Ser C. 1980;29: 323–333.
[48]  Farebrother R. Algorithm AS 204: the distribution of a positive linear combination of chi2 random variables. J R Stat Soc Ser C. 1984;33: 332–339. doi: 10.2307/2347721.
[49]  Duchesne P, Lafaye De Micheaux P. Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Comput Stat Data Anal. 2010;54: 858–862. doi: 10.1016/j.csda.2009.11.025.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133