Most of the genetic architecture of schizophrenia (SCZ) has not yet been identified. Here, we apply a novel statistical algorithm called Covariate-Modulated Mixture Modeling (CM3), which incorporates auxiliary information (heterozygosity, total linkage disequilibrium, genomic annotations, pleiotropy) for each single nucleotide polymorphism (SNP) to enable more accurate estimation of replication probabilities, conditional on the observed test statistic (“z-score”) of the SNP. We use a multiple logistic regression on z-scores to combine information from auxiliary information to derive a “relative enrichment score” for each SNP. For each stratum of these relative enrichment scores, we obtain nonparametric estimates of posterior expected test statistics and replication probabilities as a function of discovery z-scores, using a resampling-based approach that repeatedly and randomly partitions meta-analysis sub-studies into training and replication samples. We fit a scale mixture of two Gaussians model to each stratum, obtaining parameter estimates that minimize the sum of squared differences of the scale-mixture model with the stratified nonparametric estimates. We apply this approach to the recent genome-wide association study (GWAS) of SCZ (n = 82,315), obtaining a good fit between the model-based and observed effect sizes and replication probabilities. We observed that SNPs with low enrichment scores replicate with a lower probability than SNPs with high enrichment scores even when both they are genome-wide significant (p < 5x10-8). There were 693 and 219 independent loci with model-based replication rates ≥80% and ≥90%, respectively. Compared to analyses not incorporating relative enrichment scores, CM3 increased out-of-sample yield for SNPs that replicate at a given rate. This demonstrates that replication probabilities can be more accurately estimated using prior enrichment information with CM3.
References
[1]
Sullivan PF, Kendler KS, Neale MC. Schizophrenia as a complex trait: evidence from a meta-analysis of twin studies. Archives of General Psychiatry. 2003;60(12):1187–92. pmid:14662550 doi: 10.1001/archpsyc.60.12.1187
[2]
Ripke S, O'Dushlaine C, Chambert K, Moran JL, K?hler AK, Akterin S, et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nature genetics. 2013;45(10):1150–9. doi: 10.1038/ng.2742. pmid:23974872
[3]
Sullivan PF. The psychiatric GWAS consortium: big science comes to psychiatry. Neuron. 2010;68(2):182–6. doi: 10.1016/j.neuron.2010.10.003. pmid:20955924
[4]
Lee SH, DeCandia TR, Ripke S, Yang J, Sullivan PF, Goddard ME, et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nature Genetics. 2012;44(3):247–50. doi: 10.1038/ng.1108. pmid:22344220
[5]
Schork AJ, Thompson WK, Pham P, Torkamani A, Roddey JC, Sullivan PF, et al. All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoS genetics. 2013;9(4):e1003449. doi: 10.1371/journal.pgen.1003449. pmid:23637621
[6]
Andreassen OA, Thompson WK, Dale AM. Boosting the power of schizophrenia genetics by leveraging new statistical tools. Schizophrenia bulletin. 2014;40(1):13–7. doi: 10.1093/schbul/sbt168. pmid:24319118
[7]
Sklar P, Ripke S, Scott LJ, Andreassen OA, Cichon S, Craddock N, et al. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nature Genetics. 2011;43(10):977. doi: 10.1038/ng.943. pmid:21926972
[8]
Andreassen OA, Thompson WK, Schork AJ, Ripke S, Mattingsdal M, Kelsoe JR, et al. Improved detection of common variants associated with schizophrenia and bipolar disorder using pleiotropy-informed conditional false discovery rate. PLoS Genetics. 2013;9(4):e1003455. doi: 10.1371/journal.pgen.1003455. pmid:23637625
[9]
Andreassen OA, Djurovic S, Thompson WK, Schork AJ, Kendler KS, O’Donovan MC, et al. Improved detection of common variants associated with schizophrenia by leveraging pleiotropy with cardiovascular-disease risk factors. The American Journal of Human Genetics. 2013;92(2):197–209. doi: 10.1016/j.ajhg.2013.01.001. pmid:23375658
[10]
Kavanagh D, Dwyer S, O'Donovan M, Owen M. The ENCODE project: implications for psychiatric genetics. Molecular psychiatry. 2013;18(5):540–2. doi: 10.1038/mp.2013.13. pmid:23478746
[11]
Schizophrenia Working Group of the Psychiatry Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;(511):421–7.
[12]
Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–1. doi: 10.1093/bioinformatics/btq340. pmid:20616382
[13]
Efron B. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction: Cambridge University Press; 2010.
[14]
Hibar DP, Stein JL, Renteria ME, Arias-Vasquez A, Desrivières S, Jahanshad N, et al. Common genetic variants influence human subcortical brain structures. Nature. 2015. 2015;(520):224–9. doi: 10.1038/nature14101.
[15]
Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith GL, Ahmad T, et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nature Genetics. 2010;42(12):1118–25. doi: 10.1038/ng.717. pmid:21102463
[16]
Zablocki RW, Schork AJ, Levine RA, Andreassen OA, Dale AM, Thompson WK. Covariate-modulated local false discovery rate for genome-wide association studies. Bioinformatics. 2014;30(15):2098–104. doi: 10.1093/bioinformatics/btu145. pmid:24711653
[17]
Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. The American Journal of Human Genetics. 2012;90(1):7–24. doi: 10.1016/j.ajhg.2011.11.029. pmid:22243964
[18]
Thompson WK, Wang Y, Schork AJ, Witoelar A, Zuber V, Xu S, et al. An Empirical Bayes method for estimating the distribution of effects in genome-wide association studies. in press. doi: 10.1371/journal.pgen.1005717
[19]
Zhou X, Carbonetto P, Stephens M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genetics. 2013;9(2):e1003264. doi: 10.1371/journal.pgen.1003264. pmid:23408905
[20]
Sun L, Craiu RV, Paterson AD, Bull SB. Stratified false discovery control for large‐scale hypothesis testing with application to genome‐wide association studies. Genetic epidemiology. 2006;30(6):519–30. pmid:16800000 doi: 10.1002/gepi.20164
[21]
Lee S-I, Dudley AM, Drubin D, Silver PA, Krogan NJ, Pe'er D, et al. Learning a prior on regulatory potential from eQTL data. PLoS Genetics. 2009;5(1):e1000358. doi: 10.1371/journal.pgen.1000358. pmid:19180192
[22]
Pickrell JK. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. The American Journal of Human Genetics. 2014;94(4):559–73. doi: 10.1016/j.ajhg.2014.03.004. pmid:24702953
[23]
Darnell G, Duong D, Han B, Eskin E. Incorporating prior information into association studies. Bioinformatics. 2012;28(12):i147–i53. doi: 10.1093/bioinformatics/bts235. pmid:22689754
[24]
Eskin E. Increasing power in association studies by using linkage disequilibrium structure and molecular function as prior information. Genome Research. 2008;18(4):653–60. doi: 10.1101/gr.072785.107. pmid:18353808
[25]
Roeder K, Bacanu S-A, Wasserman L, Devlin B. Using linkage genome scans to improve power of association in genome scans. The American Journal of Human Genetics. 2006;78(2):243–52. pmid:16400608 doi: 10.1086/500026
[26]
Schizophrenia Psychiatry Genome-Wide Association Study (GWAS) Consortium. Genome-wide association study identifies five new schizophrenia loci. Nature Genetics. 2011;43(10):969–76. doi: 10.1038/ng.940. pmid:21926974
[27]
Lambert J-C, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, Bellenguez C, et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nature Genetics. 2013; 45(12):1452–8. doi: 10.1038/ng.2802. pmid:24162737
[28]
Trynka G, Sandor C, Han B, Xu H, Stranger BE, Liu XS, et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nature genetics. 2013;45(2):124–30. doi: 10.1038/ng.2504. pmid:23263488
[29]
Andreassen O, Harbo H, Wang Y, Thompson W, Schork A, Mattingsdal M, et al. Genetic pleiotropy between multiple sclerosis and schizophrenia but not bipolar disorder: differential involvement of immune-related gene loci. Molecular psychiatry. 2014:1–8. Epub 28 January 2014. doi: 10.1038/mp.2013.195.
[30]
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007;81(3):559–75. pmid:17701901 doi: 10.1086/519795
[31]
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38(8):904–9. pmid:16862161 doi: 10.1038/ng1847
[32]
Howie B, Marchini J, Stephens M. Genotype imputation with thousands of genomes. G3: Genes, Genomes, Genetics. 2011;1(6):457–70. doi: 10.1534/g3.111.001198
[33]
Delaneau O, Marchini J, Zagury J-F. A linear complexity phasing method for thousands of genomes. Nature Methods. 2012;9(2):179–81. doi: 10.1038/nmeth.1785
[34]
de Bakker PI, Ferreira MA, Jia X, Neale BM, Raychaudhuri S, Voight BF. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Human Molecular Genetics. 2008;17(R2):R122–R8. doi: 10.1093/hmg/ddn288. pmid:18852200