We present a method to assist in interpretation of the functional impact of intergenic disease-associated SNPs that is not limited to search strategies proximal to the SNP. The method builds on two sources of external knowledge: the growing understanding of three-dimensional spatial relationships in the genome, and the substantial repository of information about relationships among genetic variants, genes, and diseases captured in the published biomedical literature. We integrate chromatin conformation capture data (HiC) with literature support to rank putative target genes of intergenic disease-associated SNPs. We demonstrate that this hybrid method outperforms a genomic distance baseline on a small test set of expression quantitative trait loci, as well as either method individually. In addition, we show the potential for this method to uncover relationships between intergenic SNPs and target genes across chromosomes. With more extensive chromatin conformation capture data becoming readily available, this method provides a way forward towards functional interpretation of SNPs in the context of the three dimensional structure of the genome in the nucleus.
References
[1]
Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, Tranchevent L-C, De Moor B, Marynen P, Hassan B, Carmeliet P, Moreau Y. 2006. Gene prioritization through genomic data fusion. Nature Biotechnology 24(5):537-544
[2]
2012. Asking for more. Nature Genetics 44(7):733
[3]
Bauer-Mehren A, Rautschka M, Sanz F, Furlong LI. 2010. Disgenet: a cytoscape plugin to visualize, integrate, search and analyze gene–disease networks. Bioinformatics 26(22):2924-2926
[4]
Bed J, Ong CS. 2014. Multivariate Spearman’s rho for aggregating ranks using copulas. arXiv preprint
[5]
Blumenthal RD, Leon E, Hansen HJ, Goldenberg DM. 2007. Expression patterns of CEACAM5 and CEACAM6 in primary and metastatic cancers. BMC Cancer 7(1):2
[6]
Cheng D, Knox C, Young N, Stothard P, Damaraju S, Wishart DS. 2008. Polysearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Research 36(suppl 2):W399-W405
[7]
Cheung WA, Ouellette BF, Wasserman WW. 2012. Inferring novel gene-disease associations using medical subject heading over-representation profiles. Genome Medicine 4(9):75
[8]
Den Dunnen J, Antonarakis S. 2000. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Human Mutation 15(1):7-12
[9]
Doughty E, Kertesz-Farkas A, Bodenreider O, Thompson G, Adadey A, Peterson T, Kann M. 2011. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. Bioinformatics 27(3):408-415
[10]
Duggal G, Wang H, Kingsford C. 2014. Higher-order chromatin domains link eQTLs with the expression of far-away genes. Nucleic Acids Research 42(1):87-96
[11]
Ernst J, Kellis M. 2010. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature Biotechnology 28(8):817-825
[12]
Frijters R, Van Vugt M, Smeets R, Van Schaik R, de Vlieg J, Alkema W. 2010. Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Computational Biology 6(9):e1000943
[13]
Fullwood MJ, Liu MH, Pan YF, Liu J, Xu H, Mohamed YB, Orlov YL, Velkov S, Ho A, Mei PH, Chew EGY, Huang PYH, Welboren W-J, Han Y, Ooi HS, Ariyaratne PN, Vega VB, Luo Y, Tan PY, Choy PY, Wansa KDSA, Zhao B, Lim KS, Leow SC, Yow JS, Joseph R, Li H, Desai KV, Thomsen JS, Lee YK, Karuturi RKM, Herve T, Bourque G, Stunnenberg HG, Ruan X, Cacheux-Rataboul V, Sung W-K, Liu ET, Wei C-L, Cheung E, Ruan Y. 2009. An oestrogen-receptor-[agr]-bound human chromatin interactome. Nature 462(7269):58-64
[14]
Funk C, Baumgartner W, Garcia B, Roeder C, Bada M, Cohen K, Hunter L, Verspoor K. 2014. Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinformatics 15(1):59
[15]
Gabow A, Leach S, Baumgartner W, Hunter L, Goldberg D. 2008. Improving protein function prediction methods with integrated literature data. BMC Bioinformatics 9(1):198
[16]
Gilad Y, Rifkin SA, Pritchard JK. 2009. Revealing the architecture of gene regulation: the promise of eqtl studies. Trends in Genetics 24:408-415
[17]
Gonzalez G, Uribe JC, Tari L, Brophy C, Baral C. 2007. Mining gene-disease relationships from biomedical literature: weighting protein–protein interactions and connectivity measures. In: Pac symp biocomput. 28-39 PMID 17992743
[18]
Jiang B, Zhu ZZ, Liu F, Yang LJ, Zhang WY, Yuan HH, Wang JG, Hu XH, Huang G. 2011. STAT3 gene polymorphisms and susceptibility to non-small cell lung cancer. Genetics and Molecular Research: GMR 10(3):1856-1865
[19]
Jimeno-Yepes AJ, Sticco JC, Mork JG, Aronson AR. 2013. GeneRIF indexing: sentence selection based on machine learning. BMC Bioinformatics 14(1):171
[20]
Jimeno Yepes A, Verspoor K. 2014a. Literature mining of genetic variants for curation: quantifying the importance of supplementary material. Database 2014 bau003
[21]
Jimeno Yepes A, Verspoor K. 2014b. Mutation extraction tools can be combined for robust recognition of genetic variants in the literature. F1000 Research
[22]
Johansson M, Roberts A, Chen D, Li Y, Delahaye-Sourdeix M, Aswani N, Greenwood MA, Benhamou S, Lagiou P, Holcátová I, Richiardi L, Kjaerheim K, Agudo A, Castellsagué X, Macfarlane TV, Barzan L, Canova C, Thakker NS, Conway DI, Znaor A, Healy CM, Ahrens W, Zaridze D, Szeszenia-Dabrowska N, Lissowska J, Fabiánová E, Mates IN, Bencko V, Foretova L, Janout V, Curado MP, Koifman S, Menezes A, Wünsch-Filho V, Eluf-Neto J, Boffetta P, Franceschi S, Herrero R, Fernandez Garrote L, Talamini R, Boccia S, Galan P, Vatten L, Thomson P, Zelenika D, Lathrop M, Byrnes G, Cunningham H, Brennan P, Wakefield J, McKay JD. 2012. Using prior information from the medical literature in gwas of oral cancer identifies novel susceptibility variant on chromosome 4—the adapt method. PLoS ONE 7(5):e36888
[23]
Krallinger M, Izarzugaza JM, Rodriguez-Penagos C, Valencia A. 2009. Extraction of human kinase mutations from literature, databases and genotyping studies. BMC Bioinformatics 10(Suppl 8):S1
[24]
Li MJ, Wang LY, Xia Z, Sham PC, Wang J. 2013. GWAS3D: detecting human regulatory variants by integrative analysis of genome-wide associations, chromosome interactions and histone modifications. Nucleic Acids Research 41(Web Server issue):W150-W158
[25]
Lieberman-Aiden E, Van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J. 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289-293
[26]
Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, Hasz R, Walters G, Garcia F, Young N, Foster B, Moser M, Karasik E, Gillard B, Ramsey K, Sullivan S, Bridge J, Magazine H, Syron J, Fleming J, Siminoff L, Traino H, Mosavel M, Barker L, Jewell S, Rohrer D, Maxim D, Filkins D, Harbach P, Cortadillo E, Berghuis B, Turner L, Hudson E, Feenstra K, Sobin L, Robb J, Branton P, Korzeniewski G, Shive C, Tabor D, Qi L, Groch K, Nampally S, Buia S, Zimmerman A, Smith A, Burges R, Robinson K, Valentino K, Bradbury D, Cosentino M, Diaz-Mayoral N, Kennedy M, Engel T, Williams P, Erickson K, Ardlie K, Winckler W, Getz G, DeLuca D, MacArthur D, Kellis M, Thomson A, Young T, Gelfand E, Donovan M, Meng Y, Grant G, Mash D, Marcus Y, Basile M, Liu J, Zhu J, Tu Z, Cox NJ, Nicolae DL, Gamazon ER, Im HK, Konkashbaev A, Pritchard J, Stevens M, Flutre T, Wen X, Dermitzakis ET, Lappalainen T, Guigo R, Monlong J, Sammeth M, Koller D, Battle A, Mostafavi S, McCarthy M, Rivas M, Maller J, Rusyn I, Nobel A, Wright F, Shabalin A, Feolo M, Sharopova N, Sturcke A, Paschal J, Anderson JM, Wilder EL, Derr LK, Green ED, Struewing JP, Temple G, Volpi S, Boyer JT, Thomson EJ, Guyer MS, Ng C, Abdallah A, Colantuoni D, Insel TR, Koester SE, Little AR, Bender PK, Lehner T, Yao Y, Compton CC, Vaught JB, Sawyer S, Lockhart NC, Demchok J, Moore HF. 2013. The Genotype-Tissue Expression (GTEx) project. Nature Genetics 45(6):580-585
[27]
Macintyre G, Bailey J, Haviv I, Kowalczyk A. 2010. is-rSNP: a novel technique for in silico regulatory SNP detection. Bioinformatics 26(18):i524-i530
[28]
MacKinlay A, Verspoor K. 2013. A web service annotation framework for CTD using the UIMA concept mapper. In: BioCreative challenge evaluation workshop vol. 1.
[29]
Manning CD, Schütze H. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT press.
[30]
Nica AC, Dermitzakis ET. 2013. Expression quantitative trait loci: present and future. Philosophical Transactions of the Royal Society B: Biological Sciences 368 20120362
[31]
zgür A, Vu T, Erkan G, Radev DR. 2008. Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics 24(13):i277-i285
[32]
Pastinen T. 2010. Genome-wide allele-specific analysis: insights into regulatory variation. Nature Reviews Genetics 11(8):533-538
[33]
Paul DS, Soranzo N, Beck S. 2014. Functional interpretation of non-coding sequence variation: concepts and challenges. BioEssays 36(2):191-199
[34]
Perez-Iratxeta C, Wjst M, Bork P, Andrade MA. 2005. G2d: a tool for mining genes associated with disease. BMC Genetics 6(1):45
[35]
Ravikumar K, Liu H, Cohn JD, Wall ME, Verspoor KM. 2012. Literature protein-residue associations with graph rules learned through distant supervision. Journal of Biomedical Semantics 3(S3):S2
[36]
Raychaudhuri S, Plenge RM, Rossin EJ, Ng ACY, Purcell SM, Sklar P, Scolnick EM, Xavier RJ, Altshuler D, Daly MJ, International Schizophrenia Consortium. 2009. Identifying relationships among genomic disease regions: predicting genes at pathogenic snp associations and rare deletions. PLoS Genetics 5(6):e1000534
[37]
Rebholz-Schuhmann D, Kirsch H, Arregui M, Gaudan S, Riethoven M, Stoehr P. 2007. Ebimed-text crunching to gather facts for proteins from medline. Bioinformatics 23(2):e237-e244
[38]
Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M. 2012. Linking disease associations with regulatory information in the human genome. Genome Research 22(9):1748-1759
[39]
Sherry S, Ward M, Kholodov M, Baker J, Phan L, Smigielski E, Sirotkin K. 2001. dbSNP: the NCBI database of genetic variation. Nucleic Acids Research 29:308-311
[40]
Sokolov A, Funk C, Graim K, Verspoor K, Ben-Hur A. 2013. Combining heterogeneous data sources for accurate functional annotation of proteins. BMC Bioinformatics 14(Suppl 3):S10
[41]
Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. 2013. Pleiotropy in complex traits: challenges and strategies. Nature Reviews Genetics 14(7):483-495
[42]
Spivakov M, Akhtar J, Kheradpour P, Beal K, Girardot C, Koscielny G, Herrero J, Kellis M, Furlong EEM, Birney E. 2012. Analysis of variation at transcription factor binding sites in Drosophila and humans. Genome Biology 13(9):R49
[43]
Stranger BE, Montgomery SB, Dimas AS, Parts L, Stegle O, Ingle CE, Sekowska M, Smith GD, Evans D, Gutierrez-Arcelus M, Price A, Raj T, Nisbett J, Nica AC, Beazley C, Durbin R, Deloukas P, Dermitzakis ET. 2012. Patterns of cis regulatory variation in diverse human populations. PLoS Genetics 8(4):e1002639
[44]
Tanenblatt M, Coden A, Sominsky I. 2010. The conceptmapper approach to named entity recognition. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M, Tapias D, eds. Proceedings of the seventh international conference on language resources and evaluation (LREC’10). Valletta, Malta. European Language Resources Association (ELRA).
[45]
The 1000 Genomes Project Consortium. 2010. A map of human genome variation from population-scale sequencing. Nature 467:1061-1073
[46]
The Cancer Genome Atlas Research Network. 2011. Integrated genomic analyses of ovarian carcinoma. Nature 474(7353):609-615
[47]
Thomas PE, Klinger R, Furlong LI, Hofmann-Apitius M, Friedrich CM. 2011. Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers. BMC Bioinformatics 12(Suppl 4):S4
[48]
Tiffin N, Kelso JF, Powell AR, Pan H, Bajic VB, Hide WA. 2005. Integration of text-and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Research 33(5):1544-1552
[49]
Tsuruoka Y, Tsujii J, Ananiadou S. 2008. Facta: a text search engine for finding associated biomedical concepts. Bioinformatics 24(21):2559-2560
[50]
Van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA. 2006. A text-mining analysis of the human phenome. European Journal of Human Genetics 14(5):535-542
[51]
Verspoor K, MacKinlay A, Cohn J, Wall M. 2013. Detection of protein catalytic sites in the biomedical literature. Pacific Symposium on Biocomputing 18:433-444 PMID 23424147
[52]
Ward LD, Kellis M. 2012a. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Research 40(Database):D930-D934
[53]
Ward LD, Kellis M. 2012b. Interpreting noncoding genetic variation in complex traits and human disease. Nature Biotechnology 30(11):1095-1106
[54]
Xiang Z, Qin T, Qin Z, He Y. 2013. A genome-wide mesh-based literature mining system predicts implicit gene-to-gene relationships and networks. BMC Systems Biology 7(Suppl 3):S9
[55]
Yaffe E, Tanay A. 2011. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nature Genetics 43(11):1059-1065