Inferring Polymorphism-Induced Regulatory Gene Networks Active in Human Lymphocyte Cell Lines by Weighted Linear Mixed Model Analysis of Multiple RNA-Seq Datasets
Single-nucleotide polymorphisms (SNPs) contribute to the between-individual expression variation of many genes. A regulatory (trait-associated) SNP is usually located near or within a (host) gene, possibly influencing the gene’s transcription or/and post-transcriptional modification. But its targets may also include genes that are physically farther away from it. A heuristic explanation of such multiple-target interferences is that the host gene transfers the SNP genotypic effects to the distant gene(s) by a transcriptional or signaling cascade. These connections between the host genes (regulators) and the distant genes (targets) make the genetic analysis of gene expression traits a promising approach for identifying unknown regulatory relationships. In this study, through a mixed model analysis of multi-source digital expression profiling for 140 human lymphocyte cell lines (LCLs) and the genotypes distributed by the international HapMap project, we identified 45 thousands of potential SNP-induced regulatory relationships among genes (the significance level for the underlying associations between expression traits and SNP genotypes was set at FDR < 0.01). We grouped the identified relationships into four classes (paradigms) according to the two different mechanisms by which the regulatory SNPs affect their cis- and trans- regulated genes, modifying mRNA level or altering transcript splicing patterns. We further organized the relationships in each class into a set of network modules with the cis- regulated genes as hubs. We found that the target genes in a network module were often characterized by significant functional similarity, and the distributions of the target genes in three out of the four networks roughly resemble a power-law, a typical pattern of gene networks obtained from mutation experiments. By two case studies, we also demonstrated that significant biological insights can be inferred from the identified network modules.
References
[1]
WTCCC (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678.
[2]
http://www.genome.gov/26525384.
[3]
Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, et al. (2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467: 832–838.
[4]
Stranger BE, Forrest MS, Clark AG, Minichiello MJ, Deutsch S, et al. (2005) Genome-wide associations of gene expression variation in humans. PLoS Genet 1: e78.
[5]
Huang RS, Duan S, Bleibel WK, Kistner EO, Zhang W, et al. (2007) A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity. Proc Natl Acad Sci U S A 104: 9758–9763.
[6]
Fraser HB, Xie X (2009) Common polymorphic transcript variation in human disease. Genome Res 19: 567–575.
[7]
Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, et al. (2010) Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464: 773–777.
[8]
Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, et al. (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464: 768–772.
[9]
Cheung VG, Nayak RR, Wang IX, Elwyn S, Cousins SM, et al. (2010) Polymorphic cis- and trans-regulation of human gene expression. PLoS Biol 8.
[10]
Lalonde E, Ha KC, Wang Z, Bemmo A, Kleinman CL, et al. (2011) RNA sequencing reveals the role of splicing polymorphisms in regulating human gene expression. Genome Res 21: 545–554.
[11]
Degner JF, Pai AA, Pique-Regi R, Veyrieras JB, Gaffney DJ, et al. (2012) DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482: 390–394.
[12]
Veyrieras JB, Kudaravalli S, Kim SY, Dermitzakis ET, Gilad Y, et al. (2008) High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet 4: e1000214.
[13]
Veyrieras JB, Gaffney DJ, Pickrell JK, Gilad Y, Stephens M, et al. (2012) Exon-specific QTLs skew the inferred distribution of expression QTLs detected using gene expression array data. PLoS One 7: e30629.
[14]
Schlitt T, Brazma A (2007) Current approaches to gene regulatory network modelling. BMC Bioinformatics 8 Suppl 6S9.
[15]
Fehrmann RS, Jansen RC, Veldink JH, Westra HJ, Arends D, et al. (2011) Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS Genet 7: e1002197.
[16]
Wang X, Chen Y, Lu L (2010) Genetic regulatory network analysis for app based on genetical genomics approach. Exp Aging Res 36: 79–93.
[17]
Hapmap website. http://hapmap.ncbi.nlm.nih.gov/. Accessed 2012 Mar.
[18]
GEO website. http://www.ncbi.nlm.nih.gov/geo/. Accessed 2012 Mar.
[19]
SRA website. http://www.ncbi.nlm.nih.gov/sra. Accessed 2012 Mar.
[20]
ArrayExpress website. http://www.ebi.ac.uk/arrayexpress/. Accessed 2012 Mar.
[21]
Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, et al. (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74: 106–120.
[22]
Brown EM, Barrat BJ (2007) The HapMap -A haplotype map of human genome. Bioinformatics for geneticists : a bioinformatics primer for the analysis of genetic data. 2 ed. West Sussex: John Wiley & Sons, Ltd. pp. 35?54.
[23]
Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4: Article17.
[24]
Kim PM, Lam HY, Urban AE, Korbel JO, Affourtit J, et al. (2008) Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. Genome Res 18: 1865–1874.
[25]
Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. arXiv. pp. http://arxiv.org/pdf/0706.1062%0729.
[26]
Johnson M, Sharma M, Henderson BR (2009) IQGAP1 regulation and roles in cancer. Cell Signal 21: 1471–1478.
[27]
Johnson M, Sharma M, Brocardo MG, Henderson BR (2011) IQGAP1 translocates to the nucleus in early S-phase and contributes to cell cycle progression after DNA replication arrest. Int J Biochem Cell Biol 43: 65–73.
[28]
Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protoc 4: 44–57.
[29]
David resources website. http://david.abcc.ncifcrf.gov/. Accessed 2012 May.
[30]
Rafnar T, Gudbjartsson DF, Sulem P, Jonasdottir A, Sigurdsson A, et al. (2011) Mutations in BRIP1 confer high risk of ovarian cancer. Nat Genet 43: 1104–1107.
Outteryck O, de Seze J, Stojkovic T, Cuisset JM, Dobbelaere D, et al. (2012) Methionine synthase deficiency: a rare cause of adult-onset leukoencephalopathy. Neurology 79: 386–388.
[33]
Yuasa T, Venugopal B, Weremowicz S, Morton CC, Guo L, et al. (2002) The sequence, expression, and chromosomal localization of a novel polycystic kidney disease 1-like gene, PKD1L1, in human. Genomics 79: 376–386.
[34]
Hogan MC, Griffin MD, Rossetti S, Torres VE, Ward CJ, et al. (2003) PKHDL1, a homolog of the autosomal recessive polycystic kidney disease gene, encodes a receptor with inducible T lymphocyte expression. Hum Mol Genet 12: 685–698.
[35]
O'Connor L, Strasser A, O'Reilly LA, Hausmann G, Adams JM, et al. (1998) Bim: a novel member of the Bcl-2 family that promotes apoptosis. EMBO J 17: 384–395.
[36]
Real PJ, Cao Y, Wang R, Nikolovska-Coleska Z, Sanz-Ortiz J, et al. (2004) Breast cancer cells can evade apoptosis-mediated selective killing by a novel small molecule inhibitor of Bcl-2. Cancer Res 64: 7947–7953.
[37]
Oltersdorf T, Elmore SW, Shoemaker AR, Armstrong RC, Augeri DJ, et al. (2005) An inhibitor of Bcl-2 family proteins induces regression of solid tumours. Nature 435: 677–681.
[38]
Sommer S (2005) The importance of immune gene variability (MHC) in evolutionary ecology and conservation. Front Zool 2: 16.
[39]
Traherne JA (2008) Human MHC architecture and evolution: implications for disease association studies. Int J Immunogenet 35: 179–192.
[40]
Rung J, Schlitt T, Brazma A, Freivalds K, Vilo J (2002) Building and analysing genome-wide gene disruption networks. Bioinformatics 18 Suppl 2S202–210.
Herskind AM, McGue M, Holm NV, Sorensen TI, Harvald B, et al. (1996) The heritability of human longevity: a population-based study of 2872 Danish twin pairs born 1870?1900. Hum Genet 97: 319–323.
[44]
Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, et al. (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56–65.
[45]
Nica AC, Montgomery SB, Dimas AS, Stranger BE, Beazley C, et al. (2010) Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet 6: e1000895.
[46]
Zhang W, Edwards A, Zhu D, Flemington EK, Deininger P, et al. (2012) miRNA-mediated relationships between Cis-SNP genotypes and transcript intensities in lymphocyte cell lines. PLoS One 7: e31429.
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, et al. (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28: 511–515.
[49]
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5: 621–628.
[50]
Bates D (2012) Computational methods for mixed models, http://cran.r-project.org/web/packages/l?me4/vignettes/Theory.pdf.
[51]
Henderson CR, Kempthorne O, Searle SR, von Krosigk CM (1959) The Estimation of Environmental and Genetic Trends from Records Subject to Culling. Biometrics 15: 192–218.
[52]
McLean RA, Sanders WL, Stroup WW (1991) A Unified Approach to Mixed Linear Models. The American Statistician 45: 54–64.
[53]
Littell RC (2006) SAS for mixed models. Cary, N.C.: SAS Institute, Inc. xii, 814 p. p.
[54]
Mak TK (1992) Estimation of Parameters in Heteroscedastic Linear Models. Journal of the Royal Statistical Society Series B 54: 649–655.
Rao CR, Toutenburg H, Fieger A, Heumann C, Nittner T, et al. (1999) Linear Models: Least Squares and Alternatives: Springer Series in Statistics.
[57]
Angrist JD, Pischke J-S (2009) Mostly harmless econometrics : an empiricist's companion. Princeton: Princeton University Press. xiii, 373 p.p .
[58]
Cline MS, Blume J, Cawley S, Clark TA, Hu JS, et al. (2005) ANOSVA: a statistical method for detecting splice variation from expression data. Bioinformatics 21 Suppl 1i107–115.
[59]
Casella G, Berger RL (2002) Statistical inference. Australia ; Pacific Grove, CA: Thomson Learning. xxviii, 660 p.p.
[60]
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13: 2498–2504.