DNA sequence variation within human leukocyte antigen (HLA) genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC) makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC) region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C) and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1) loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals) and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals). We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N = 918) with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes.
References
[1]
Horton R, Wilming L, Rand V, Lovering RC, Bruford EA, et al. (2004) Gene map of the extended human MHC. Nature reviews Genetics 5: 889–899.
[2]
Carrington M, O’Brien SJ (2003) The influence of HLA genotype on AIDS. Annu Rev Med 54: 535–551.
[3]
Fernando MM, Stevens CR, Walsh EC, De Jager PL, Goyette P, et al. (2008) Defining the role of the MHC in autoimmunity: a review and pooled analysis. PLoS Genet 4: e1000024.
[4]
Morishima S, Ogawa S, Matsubara A, Kawase T, Nannya Y, et al. (2010) Impact of highly conserved HLA haplotype on acute graft-versus-host disease. Blood 115: 4664–4670.
[5]
Bharadwaj M, Illing P, Theodossis A, Purcell AW, Rossjohn J, et al. (2012) Drug hypersensitivity and human leukocyte antigens of the major histocompatibility complex. Annu Rev Pharmacol Toxicol 52: 401–431.
[6]
Bodmer W, Bonilla C (2008) Common and rare variants in multifactorial susceptibility to common diseases. Nature genetics 40: 695–701.
[7]
Traherne JA (2008) Human MHC architecture and evolution: implications for disease association studies. International journal of immunogenetics 35: 179–192.
[8]
de Bakker PIW, McVean G, Sabeti PC, Miretti MM, Green T, et al. (2006) A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nature genetics 38: 1166–1172.
[9]
Erlich H (2012) HLA DNA typing: past, present, and future. Tissue antigens 80: 1–11.
[10]
Monsuur AJ, de Bakker PIW, Zhernakova A, Pinto D, Verduijn W, et al. (2008) Effective detection of human leukocyte antigen risk alleles in celiac disease using tag single nucleotide polymorphisms. PloS one 3: e2270.
[11]
Leslie S, Donnelly P, McVean G (2008) A statistical method for predicting classical HLA alleles from SNP data. American journal of human genetics 82: 48–56.
[12]
Xie M, Li J, Jiang T (2010) Accurate HLA type inference using a weighted similarity graph. BMC Bioinformatics 11 Suppl 11S10.
[13]
Setty MN, Gusev A, Pe’er I (2011) HLA type inference via haplotypes identical by descent. J Comput Biol 18: 483–493.
[14]
Dilthey AT, Moutsianas L, Leslie S, McVean G (2011) HLA*IMP–an integrated framework for imputing classical HLA alleles from SNP genotypes. Bioinformatics 27: 968–972.
[15]
de Bakker PIW, Raychaudhuri S (2012) Interrogating the major histocompatibility complex with high-throughput genomics. Human molecular genetics 21: R29–36.
[16]
Todd JA, Bell JI, McDevitt HO (1987) HLA-DQ beta gene contributes to susceptibility and resistance to insulin-dependent diabetes mellitus. Nature 329: 599–604.
[17]
Pereyra F, Jia X, McLaren PJ, Telenti A, de Bakker PIW, et al. (2010) The major genetic determinants of HIV-1 control affect HLA class I peptide presentation. Science 330: 1551–1557.
[18]
Browning BL, Browning SR (2009) A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84: 210–223.
[19]
The Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678.
[20]
Cucca F, Lampis R, Congia M, Angius E, Nutland S, et al. (2001) A correlation between the relative predisposition of MHC class II alleles to type 1 diabetes and the structure of their proteins. Human molecular genetics 10: 2025–2037.
[21]
Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nature reviews Genetics 11: 499–511.
[22]
Erlich RL, Jia X, Anderson S, Banks E, Gao X, et al. (2011) Next-generation sequencing for HLA typing of class I loci. BMC genomics 12: 42.
[23]
Stewart CA, Horton R, Allcock RJ, Ashurst JL, Atrazhev AM, et al. (2004) Complete MHC haplotype sequencing for common disease gene mapping. Genome research 14: 1176–1187.
[24]
Traherne JA, Horton R, Roberts AN, Miretti MM, Hurles ME, et al. (2006) Genetic analysis of completely sequenced disease-associated MHC haplotypes identifies shuffling of segments in recent human history. PLoS genetics 2: e9.
[25]
Horton R, Gibson R, Coggill P, Miretti M, Allcock RJ, et al. (2008) Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project. Immunogenetics 60: 1–18.
[26]
McLaren PJ, Ripke S, Pelak K, Weintrob AC, Patsopoulos NA, et al.. (2012) Fine-mapping classical HLA variation associated with durable host control of HIV-1 infection in African Americans. Human molecular genetics.
[27]
Raychaudhuri S, Sandor C, Stahl EA, Freudenberg J, Lee HS, et al. (2012) Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nature genetics 44: 291–296.
[28]
Achkar JP, Klei L, de Bakker PIW, Bellone G, Rebert N, et al. (2012) Amino acid position 11 of HLA-DRbeta1 is a major determinant of chromosome 6p association with ulcerative colitis. Genes and immunity 13: 245–252.
[29]
Invernizzi P, Ransom M, Raychaudhuri S, Kosoy R, Lleo A, et al.. (2012) Classical HLA-DRB1 and DPB1 alleles account for HLA associations with primary biliary cirrhosis. Genes and immunity.
[30]
Strange A, Capon F, Spencer CC, Knight J, Weale ME, et al. (2010) A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nature genetics 42: 985–990.
[31]
Evans DM, Spencer CC, Pointon JJ, Su Z, Harvey D, et al. (2011) Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nature genetics 43: 761–767.
[32]
Sawcer S, Hellenthal G, Pirinen M, Spencer CC, Patsopoulos NA, et al. (2011) Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476: 214–219.
[33]
Li S, Qian J, Yang Y, Zhao W, Dai J, et al. (2012) GWAS Identifies Novel Susceptibility Loci on 6p21.32 and 21q21.3 for Hepatocellular Carcinoma in Chronic Hepatitis B Virus Carriers. PLoS genetics 8: e1002791.
[34]
Moutsianas L, Enciso-Mora V, Ma YP, Leslie S, Dilthey A, et al. (2011) Multiple Hodgkin lymphoma-associated loci within the HLA region at chromosome 6p21.3. Blood 118: 670–674.
[35]
McCormack M, Alfirevic A, Bourgeois S, Farrell JJ, Kasperaviciute D, et al. (2011) HLA-A*3101 and carbamazepine-induced hypersensitivity reactions in Europeans. The New England journal of medicine 364: 1134–1143.
[36]
Gregersen PK, Kosoy R, Lee AT, Lamb J, Sussman J, et al.. (2012) Risk for myasthenia gravis maps to a (151) Pro–>Ala change in TNIP1 and to human leukocyte antigen-B*08. Annals of neurology.
[37]
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. American journal of human genetics 81: 559–575.
[38]
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, et al. (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26: 2867–2873.
[39]
Robinson J, Waller MJ, Fail SC, McWilliam H, Lopez R, et al. (2009) The IMGT/HLA database. Nucleic Acids Res 37: D1013–1017.
[40]
Strachan DP, Rudnicka AR, Power C, Shepherd P, Fuller E, et al. (2007) Lifecourse influences on health among British adults: effects of region of residence in childhood and adulthood. Int J Epidemiol 36: 522–531.