Computational analyses of functions of gene sets obtained in microarray analyses or by topical database searches are increasingly important in biology. To understand their functions, the sets are usually mapped to Gene Ontology knowledge bases by means of over-representation analysis (ORA). Its result represents the specific knowledge of the functionality of the gene set. However, the specific ontology typically consists of many terms and relationships, hindering the understanding of the ‘main story’. We developed a methodology to identify a comprehensibly small number of GO terms as “headlines” of the specific ontology allowing to understand all central aspects of the roles of the involved genes. The Functional Abstraction method finds a set of headlines that is specific enough to cover all details of a specific ontology and is abstract enough for human comprehension. This method exceeds the classical approaches at ORA abstraction and by focusing on information rather than decorrelation of GO terms, it directly targets human comprehension. Functional abstraction provides, with a maximum of certainty, information value, coverage and conciseness, a representation of the biological functions in a gene set plays a role. This is the necessary means to interpret complex Gene Ontology results thus strengthening the role of functional genomics in biomarker and drug discovery.
References
[1]
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29.
[2]
Camon E, Magrane M, Barrell D, Lee V, Dimmer E, et al. (2004) The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 32: D262–266. doi: 10.1093/nar/gkh021
[3]
Backes C, Keller A, Kuentzer J, Kneissl B, Comtesse N, et al. (2007) GeneTrail–advanced gene set enrichment analysis. Nucleic Acids Res 35: W186–192. doi: 10.1093/nar/gkm323
[4]
Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21: 3587–3595. doi: 10.1093/bioinformatics/bti565
[5]
Alexa A, Rahnenfuhrer J, Lengauer T (2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22: 1600–1607. doi: 10.1093/bioinformatics/btl140
[6]
Gaines P (1996) Transforming Rules and Trees into Comprehensible Knowledge Structures. In: Fayyad UM, editor. Advances in knowledge discovery and data mining. Menlo Park, Calif. [u.a.]: AAAI Press [u.a.]. pp. XIV, 611 S.
[7]
Van Camp G, Smith RJH Hereditary Hearing Loss Homepage. Available: http://hereditaryhearingloss.org. Accessed 2014 Feb 3.
[8]
Accetturo M, Creanza TM, Santoro C, Tria G, Giordano A, et al.. (2010) Finding new genes for non-syndromic hearing loss through an in silico prioritization study. PLoS One 5.
[9]
Smith RJH, Shearer AE, Hildebrand MS, Van Camp G (1993) Deafness and Hereditary Hearing Loss Overview. In: Pagon RA, Adam MP, Bird TD, Dolan CR, Fong CT et al.., editors. GeneReviews. Seattle (WA).
[10]
L?tsch J, Doehring A, Mogil JS, Arndt T, Geisslinger G, et al.. (2013) Functional genomics of pain in analgesic drug development and therapy. Pharmacol Ther.
[11]
L?tsch J, Schaeffeler E, Mittelbronn M, Winter S, Gudziol V, et al.. (2013) Functional genomics suggest neurogenesis in the adult human olfactory bulb. Brain Struct Funct.
[12]
Keller A, Backes C, Al-Awadhi M, Gerasch A, Kuntzer J, et al. (2008) GeneTrailExpress: a web-based pipeline for the statistical evaluation of microarray experiments. BMC Bioinformatics 9: 552. doi: 10.1186/1471-2105-9-552
[13]
Cover TM, Thomas JA (1991) Elements of information theory New York: Wiley & Sons.
[14]
Shannon CE (1951) A mathematical theory of communication. Bell Syst Techn J 30: 50–64.
[15]
Miller GA (1956) The magical number seven plus or minus two: some limits on our capacity for processing information. Psychol Rev 63: 81–97. doi: 10.1037/h0043158
[16]
Saaty TL, Ozdemir MS (2003) Why the magic number seven plus or minus two Mathematical and Computer Modelling. 38: 233–244. doi: 10.1016/s0895-7177(03)90083-5
[17]
Ultsch A. Emergence in Self-Organizing Feature Maps. In: Ritter H, Haschke R, editors; 2007; Bielefeld, Germany. Neuroinformatics Group.
[18]
L?tsch J, Doehring A, Mogil JS, Arndt T, Geisslinger G, et al. (2013) Functional genomics of pain in analgesic drug development and therapy. Pharmacol Ther 139: 60–70. doi: 10.1016/j.pharmthera.2013.04.004
[19]
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate - a practical and powerful approach to multiple testing. J R Stat Soc B 57.
[20]
Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, et al. (2009) AmiGO: online access to ontology and annotation data. Bioinformatics 25: 288–289. doi: 10.1093/bioinformatics/btn615
[21]
Mazandu GK, Mulder NJ (2013) Information content-based gene ontology semantic similarity approaches: toward a unified framework theory. Biomed Res Int 2013: 292063. doi: 10.1155/2013/292063