Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of “significant genes.” One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask “Which biological process is over-represented in my set of interesting genes or proteins?” we can also ask “Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?“. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases—blood coagulation disorders—that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.
Brazma A, Vilo J (2000) Gene expression data analysis. FEBS Lett 480: 17–24. doi: 10.1016/s0014-5793(00)01772-5
[3]
Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32 Suppl: 496–501. doi: 10.1038/ng1032
[4]
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98: 5116–5121. doi: 10.1073/pnas.091062498
[5]
Huttenhower C, Hibbs M, Myers C, Troyanskaya OG (2006) A scalable method for integration and functional analysis of multiple microarray datasets. Bioinformatics 22: 2890–2897. doi: 10.1093/bioinformatics/btl492
[6]
Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, et al. (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4: R28.
[7]
Rhee SY, Wood V, Dolinski K, Draghici S (2008) Use and misuse of the gene ontology annotations. Nat Rev Genet 9: 509–515. doi: 10.1038/nrg2363
[8]
Shah NH, Jonquet C, Chiang AP, Butte AJ, Chen R, et al. (2009) Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 10 Suppl 2: S1. doi: 10.1186/1471-2105-10-s2-s1
[9]
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102: 15545–15550. doi: 10.1073/pnas.0506580102
[10]
Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA (2003) Global functional profiling of gene expression. Genomics 81: 98–104. doi: 10.1016/s0888-7543(02)00021-6
[11]
(2002) Gene ontology consortium website.
[12]
Alexa A, Rahnenfuhrer J, Lengauer T (2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22: 1600–1607. doi: 10.1093/bioinformatics/btl140
[13]
Grossmann S, Bauer S, Robinson PN, Vingron M (2007) Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis. Bioinformatics 23: 3024–3031. doi: 10.1093/bioinformatics/btm440
[14]
Schlicker A, Rahnenfuhrer J, Albrecht M, Lengauer T, Domingues FS (2007) GOTax: investigating biological processes and biochemical activities along the taxonomic tree. Genome Biol 8: R33. doi: 10.1186/gb-2007-8-3-r33
[15]
Farcomeni A (2008) A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat Methods Med Res 17: 347–388. doi: 10.1177/0962280206079046
[16]
Benjamini Y, Yekutieli D (2001) The control of the false discovery. Rate under dependency. Ann Stat 29: 1165–1188. doi: 10.1214/aos/1013699998
[17]
Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21: 3587–3595. doi: 10.1093/bioinformatics/bti565
[18]
Shah NH, Fedoroff NV (2004) CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology. Bioinformatics 20: 1196–1197. doi: 10.1093/bioinformatics/bth056
[19]
Ade AS, States DJ, Wright ZC (2007) Genes2Mesh. Ann Arbor, MI: National Center for Integrative Biomedical Informatics.
[20]
Mort M, Evani US, Krishnan VG, Kamati KK, Baenziger PH, et al. (2010) In silico functional profiling of human disease-associated and polymorphic amino acid substitutions. Human Mutation 31: 335–346. doi: 10.1002/humu.21192
[21]
Spackman KA (2004) SNOMED CT milestones: endorsements are added to already-impressive standards credentials. Healthc Inform 21: 54, 56.
[22]
Smith B, Ashburner M, Rosse C, Bard J, Bug W, et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25: 1251–1255. doi: 10.1038/nbt1346
[23]
Noy NF, Shah NH, Whetzel PL, Dai B, Dorf M, et al. (2009) BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res 37(Web Server issue): W170–W173. doi: 10.1093/nar/gkp440
Shah NH, Bhatia N, Jonquet C, Rubin D, Chiang AP, et al. (2009) Comparison of concept recognizers for building the Open Biomedical Annotator. BMC Bioinformatics 10 Suppl 9: S14. doi: 10.1186/1471-2105-10-s9-s14
[26]
Jonquet C, Shah NH, Musen MA (2009) The Open Biomedical Annotator; 2009 March 15–17; San Francisco, CA. pp. 56–60.
[27]
(2010) NCBO REST services.
[28]
Osborne JD, Flatow J, Holko M, Lin SM, Kibbe WA, et al. (2009) Annotating the human genome with Disease Ontology. BMC Genomics 10 Suppl 1: S6. doi: 10.1186/1471-2164-10-s1-s6
[29]
Boyle EI, Weng S, Gollub J, Jin H, Botstein D, et al. (2004) TermFinder–open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20: 3710–3715. doi: 10.1093/bioinformatics/bth456
[30]
Toronen P, Pehkonen P, Holm L (2009) Generation of Gene Ontology benchmark datasets with various types of positive signal. BMC Bioinformatics 10: 319. doi: 10.1186/1471-2105-10-319
[31]
de Magalhaes JP, Budovsky A, Lehmann G, Costa J, Li Y, et al. (2009) The Human Ageing Genomic Resources: online databases and tools for biogerontologists. Aging Cell 8: 65–72. doi: 10.1111/j.1474-9726.2008.00442.x
[32]
Alterovitz G, Xiang M, Hill DP, Lomax J, Liu J, et al. (2010) Ontology engineering. Nat Biotechnol 28: 128–130. doi: 10.1038/nbt0210-128
[33]
Altman RB, Bergman CM, Blake J, Blaschke C, Cohen A, et al. (2008) Text mining for biology–the way forward: opinions from leading scientists. Genome Biol 9 Suppl 2: S7. doi: 10.1186/gb-2008-9-s2-s7
[34]
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, et al. (2007) The human disease network. Proc Natl Acad Sci U S A 104: 8685–8690. doi: 10.1073/pnas.0701361104
[35]
Lependu P, Musen MA, Shah NH (2011) Enabling enrichment analysis with the Human Disease Ontology. J Biomed Inform 44 Suppl 1: S31–S38. doi: 10.1016/j.jbi.2011.04.007
[36]
Krallinger M, Leitner F, Valencia A (2010) Analysis of biological processes and diseases using text mining approaches. Methods Mol Biol 593: 341–382. doi: 10.1007/978-1-60327-194-3_16
[37]
Sarkar N (2010) Using biomedical ontologies to enable morphology based phylogenetics: a feasibility study for fishes; 2010; Boston, MA.
[38]
Xu R, Musen MA, Shah NH (2010) A comprehensive analysis of five million UMLS metathesaurus terms using eighteen million MEDLINE citations. AMIA Annual Symposium proceedings/AMIA Symposium 2010: 907–911.
[39]
Tirrell R, Evani U, Berman AE, Mooney SD, Musen MA, et al. (2010) An ontology-neutral framework for enrichment analysis. AMIA Annual Symposium proceedings/AMIA Symposium 2010: 797–801.