The genome-wide association study (GWAS) approach has discovered hundreds of genetic variants associated with diseases and quantitative traits. However, despite clinical overlap and statistical correlation between many phenotypes, GWAS are generally performed one-phenotype-at-a-time. Here we compare the performance of modelling multiple phenotypes jointly with that of the standard univariate approach. We introduce a new method and software, MultiPhen, that models multiple phenotypes simultaneously in a fast and interpretable way. By performing ordinal regression, MultiPhen tests the linear combination of phenotypes most associated with the genotypes at each SNP, and thus potentially captures effects hidden to single phenotype GWAS. We demonstrate via simulation that this approach provides a dramatic increase in power in many scenarios. There is a boost in power for variants that affect multiple phenotypes and for those that affect only one phenotype. While other multivariate methods have similar power gains, we describe several benefits of MultiPhen over these. In particular, we demonstrate that other multivariate methods that assume the genotypes are normally distributed, such as canonical correlation analysis (CCA) and MANOVA, can have highly inflated type-1 error rates when testing case-control or non-normal continuous phenotypes, while MultiPhen produces no such inflation. To test the performance of MultiPhen on real data we applied it to lipid traits in the Northern Finland Birth Cohort 1966 (NFBC1966). In these data MultiPhen discovers 21% more independent SNPs with known associations than the standard univariate GWAS approach, while applying MultiPhen in addition to the standard approach provides 37% increased discovery. The most associated linear combinations of the lipids estimated by MultiPhen at the leading SNPs accurately reflect the Friedewald Formula, suggesting that MultiPhen could be used to refine the definition of existing phenotypes or uncover novel heritable phenotypes.
References
[1]
Sladek R, Rocheleau G, Rung J, Dina C, Shen L, et al. (2007) A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445: 881–885.
[2]
Wong TY, Liew G, Tapp RJ, Schmidt MI, Wang JJ, et al. (2008) Relation between fasting glucose and retinopathy for diagnosis of diabetes: three population-based cross-sectional studies. The Lancet 371: 736–743.
[3]
Sattar N, McConnachie A, Shaper AG, Blauw GJ, Buckley BM, et al. (2008) Can metabolic syndrome usefully predict cardiovascular disease and diabetes? Outcome data from two prospective studies. The Lancet 371: 1927–1935.
[4]
Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, et al. (2010) Association analyses of 249,796 individuals reveal eighteen new loci associated with body mass index. Nat. Genet 42: 937–948.
[5]
Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, et al (2009) Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat. Genet 41: 18–24.
[6]
Illig T, Gieger C, Zhai G, R?misch-Margl W, Wang-Sattler R, et al. (2010) A genome-wide perspective of genetic variation in human metabolism. Nat. Genet 42: 137–141.
[7]
Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, et al. (2010) Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids. Nature 466: 707–713.
[8]
Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, et al. (2011) International Consortium for Blood Pressure Genome-Wide Association Studies (ICBP-GWAS). Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478: 103–109.
[9]
Gieger C, Radhakrishnan A, Cvejic A, Tang W, Porcu E, et al. (2011) New gene functions in megakaryopoiesis and platelet formation. Nature 480: 201–208.
[10]
Kim S, Xing EP (2009) A multivariate regression approach to association analysis of a quantitative trait network. PLoS Genet 5(8): e1000587.
[11]
Medland SE, Neale MC (2010) An integrated phenomic approach to multivariate allelic association. Eur J Hum Genet 18: 233–239.
[12]
Klei L, Luca D, Devlin B, Roeder K (2008) Pleiotropy and Principal Components of Heritability Combine to Increase Power for Association Analysis. Gen. Epi 32: 9–19.
[13]
Yang Q, Wu H, Guo CY, Fox CS (2010) Analyze Multivariate Phenotypes in Genetic Association Studies by Combining Univariate Association Tests. Gen Epi 34: 444–454.
[14]
Cotsapas C, Voight BF, Rossin E, Lage K, Neale BM, et al. (2011) Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet 7(8): e1002254.
[15]
Ferreira MA, Purcell SM (2009) A multivariate test of association. Bioinformatics 25: 132–133.
[16]
Nyholt DR (2004) A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet 74: 765–769.
[17]
Friedewald WT, Levy RI, Fredrickson DS (1972) Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. Clin. Chem 18: 499–502.
[18]
Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, et al. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678.
[19]
Small KS, Hedman AK, Grundberg E, Nica AC, Thorleifsson G, et al. (2011) Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes. Nat. Genet 43: 561–564.
[20]
Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 106: 9362–9367.
[21]
Pendergrass SA, Brown-Gentry K, Dudek SM, Torstenson ES, Ambite JL, et al. (2011) The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genetic Epidemiol 35: 410–422.
[22]
Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39: 906–913.
[23]
?idák Z (1968) On multivariate normal probabilities of rectangles: their dependence on correlations. Ann Math Statist 39: 1425–1434.