High-throughput biology technologies have yielded complete genome sequences and functional genomics data for several organisms, including crucial microbial pathogens of humans, animals and plants. However, up to 50% of genes within a genome are often labeled “unknown”, “uncharacterized” or “hypothetical”, limiting our understanding of virulence and pathogenicity of these organisms. Even though biological functions of proteins encoded by these genes are not known, many of them have been predicted to be involved in key processes in these organisms. In particular, for Mycobacterium tuberculosis, some of these “hypothetical” proteins, for example those belonging to the Pro-Glu or Pro-Pro-Glu (PE/PPE) family, have been suspected to play a crucial role in the intracellular lifestyle of this pathogen, and may contribute to its survival in different environments. We have generated a functional interaction network for Mycobacterium tuberculosis proteins and used this to predict functions for many of its hypothetical proteins. Here we performed functional enrichment analysis of these proteins based on their predicted biological functions to identify annotations that are statistically relevant, and analysed and compared network properties of hypothetical proteins to the known proteins. From the statistically significant annotations and network information, we have tried to derive biologically meaningful annotations relatedto infection and disease. This quantitative analysis provides an overview of the functional contributions of Mycobacterium tuberculosis “hypothetical” proteins to many basic cellular functions, including its adaptability in the host system and its ability to evade the host immune response.
References
[1]
Enault, F.; Suhre, K.; Claverie, J.M. Phydbac “Gene Function Predictor”: A gene annotation tool based on genomic context analysis. BMC Bioinforma 2005, 6, doi:10.1186/1471-2105-6-247.
[2]
Mazandu, G.K.; Mulder, N.J. Scoring protein relationships in functional interaction networks predicted from sequence data. PLoS One 2011, 6, doi:10.1371/journal.pone.0018607.
[3]
Lord, P.W.; Stevens, P.W.; Brass, A.; Goble, C.A. Investigating semantic similarity measures across the Gene Ontology: The relationship between sequence and annotation. emphBioinformatics 2003, 19, 1275–1283.
[4]
Gruber, T.R. Toward principles for the design of ontologies used for knowledge sharing. Int. J. Hum.-Comput. Stud 1995, 43, 907–928.
[5]
Gruber, T.R. A translation approach to portable ontology specifications. Knowl. Acquis 1993, 5, 199–220.
[6]
Stevens, R.; Goble, C.A.; Bechhofer, S. Ontology-based knowledge representation for bioinformatics. Brief. Bioinforma 2000, 1, 398–414.
[7]
Ciocoiu, M.; Gruninger, M.; Nau, D. Ontologies for integrating engineering applications. J. Comput. Inf. Sci. Eng 2001, 1, 45–60.
[8]
Uschold, M.; Gruninger, M. Ontologies and semantics for seamless connectivity. SIGMOD Rec 2004, 33, 58–64.
[9]
Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene Ontology: Tool for the unification of biology. Nat. Genet 2000, 25, 25–29.
[10]
GO-Consortium. The Gene Ontology in 2010: Extensions and refinements. Nucleic Acids Res 2009, 38, D331–D335.
[11]
Camon, E.; Magrane, M.; Barrell, D.; Binns, D.; Fleischmann, W.; Kersey, P.; Mulder, N.; Oinn, T.; Maslen, J.; Cox, A.; et al. The Gene Ontology Annotation (GOA) project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res 2003, 13, 662–672.
[12]
Camon, E.; Barrell, D.; Lee, V.; Dimmer, E.; Apweiler, R. The Gene Ontology Annotation (GOA) Database—An integrated resource of GO annotations to the UniProt Knowledgebase. Silico Biol 2004, 4, 5–6.
[13]
Camon, E.; Magrane, M.; Barrell, D.; Lee, V.; Dimmer, E.; Maslen, J.; Binns, D.; Harte, N.; Lopez, R.; Apweiler, R. The Gene Ontology Annotation (GOA) Database: Sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 2004, 32, D262–D266.
[14]
Barrell, D.; Dimmer, E.; Huntley, R.P.; Binns, D.; O’Donovan, C.; Apweiler, R. The GOA database in 2009—An integrated Gene Ontology Annotation resource. Nucleic Acids Res 2009, 37, D396–D403.
[15]
Dimmer, E.C.; Huntley, R.P.; Barrell, D.G.; Binns, D.; Draghici, S.; Camon, E.B.; Hubank, M.; Talmud, P.J.; Apweiler, R.; Lovering, R.C. The Gene Ontology—Providing a functional role in proteomic studies. Proteomics 2008, 8(Suppl), 2–11.
[16]
Camon, E.B.; Barrell, D.G.; Dimmer, E.C.; Lee, V.; Magrane, M.; Maslen, J.; Binns, D.; Apweiler, R. An evaluation of GO annotation retrieval for BioCreAtIve and GOA. BMC Bioinforma 2005, 6, doi:10.1186/1471-2105-6-S1-S17.
[17]
Mazandu, G.K.; Mulder, N.J. Using the underlying biological organization of the MTB functional network for protein function prediction. Infect. Genet. Evol 2011, 12, 922–932.
[18]
Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. A basic local alignment search tool. J. Mol. Biolol 1990, 215, 403–410.
[19]
Altschul, S.F.; Madden, T.L.; Shaffer, A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 1997, 25, 3389–3402.
[20]
Browne, F.; Zheng, H.; Wang, H.; Azuaje, F. An integrative bayesian approach to supporting the prediction of protein-protein interactions: A case study in human heart failure. World Acad. Sci. Eng. Technol 2009, 53, 457–463.
[21]
Persener, J. Bioinformatics and Functional Genomics; John Wiley & Sons: Hoboken, NJ, USA, 2003.
[22]
Brosch, R.; Gordon, V.; Eiglmeier, K.; Garnier, T.; Tekala, F.; Yeramian, E.; Cole, S.T. Genomics, Biology and Evolution of the Mycobacterium tuberculosis Complex. In Molecular Genetics of Mycobacteria; ASM Press: Washington DC, USA, 2000; pp. 19–36.
[23]
Abdallah, A.M.; Verboom, T.; Weerdenburg, E.M.; Gey van Pittius, N.C.; Mahasha, P.W.; Jiménez, C.; Parra, M.; Cadieux, N.; Brennan, M.J.; Appelmelk, B.J.; et al. PPE and PE PGRS proteins of Mycobacterium marinum are transported via the type VII secretion system ESX-5. Mol. Microbiol 2009, 73, 329–340.
[24]
Delogu, G.; Brennan, M. Comparative immune response to PE and PE PGRS antigens of Mycobacterium tuberculosis. Infect. Immun 2001, 69, 5606–5611.
[25]
Brennan, M.J.; Delogu, G.; Chen, Y.; Bardarov, S.; Kriakov, J.; Alavi, M.; Jacobs, W.R., Jr. Evidence that Mycobacterial PE PGRS Proteins are cell surface constituents that influence interactions with other cells. Infect. Immun 2001, 69, 7326–7333.
[26]
Banu, S.; Honore, N.; Saint-Joanis, B.; Philpott, D.; Prevost, M.C.; Cole, S.T. Are the PE-PGRS proteins of Mycobacterium tuberculosis variable surface antigens? Mol. Microbiol 2002, 44, 9–19.
[27]
Huang, Y.; Wang, Y.; Bai, Y.; Wang, Z.G.; Yang, L.; Zhao, D. Expression of PE PGRS 62 protein in Mycobacterium smegmatis decrease mRNA expression of proinflammatory cytokines IL-1β, IL-6 in macrophages. Mol. Cell Biochem 2010, 340, 223–229.
[28]
Mazandu, G.K.; Opap, K.; Mulder, N.J. Contribution of microarray data to the advancement of knowledge on the Mycobacterium tuberculosis interactome: Use of the random partial least squares approach. Infect. Genet. Evol 2011, 11, 181–189.
[29]
Mazandu, G.K.; Mulder, N.J. Generation and analysis of large-scale data driven Mycobacterium tuberculosis functional networks for drug target identification. Adv. Bioinforma 2011, 2011, doi:10.1155/2011/801478.
[30]
Tsoka, S.; Ouzounis, C.A. Recent developments and future directions in computational genomics. FEBS Lett 2000, 480, 42–48.
[31]
Mason, O.; Verwoerd, M. Graph theory and networks in biology. IET Syst. Biol 2007, 1, 89–119.
[32]
Gursoy, A.; Keskin, O.; Nussinov, R. Topological properties of protein interaction networks from structural perspective. Biochem. Soc. Trans 2008, 36, 1398–1403.
[33]
Sassetti, C.M.; Boyd, D.H.; Rubin, E.J. Genes required for mycobacterial growth defined by high density mutagenesis. Mol. Microbiol 2003, 48, 77–84.
[34]
Sassetti, C.M.; Rubin, E.J. Genetic requirements for mycobacterial survival during infection. PNAS 2003, 100, 12989–12994.
[35]
InterPro Database, Available online: http://www.ebi.ac.uk/interpro , accessed on 20 April 2012.
[36]
Mazandu, G.K.; Mulder, N.J. A topology-based metric for measuring term similarity in the gene ontology. Adv. Bioinforma 2012, 2012, doi:10.1155/2012/975783.
[37]
Brennan, P.J. Structure, function, and biogenesis of the cell wall of Mycobacterium tuberculosis. Tuberculosis 2003, 83, 91–97.
[38]
Brennan, P.J.; Crick, D.C. The cell-wall core of Mycobacterium tuberculosis in the context of drug discovery. Curr. Top. Med. Chem 2007, 7, 475–488.
[39]
Wolfe, L.M.; Mahaffey, S.B.; Kruh, N.A.; Dobos, K.M. Proteomic definition of the cell wall of Mycobacterium tuberculosis. J. Proteome Res 2010, 9, 5816–5826.
[40]
Brennan, P.J.; Nikaido, H. The envelope of mycobacteria. Annu. Rev. Biochem 1995, 64, 29–63.
[41]
Gu, S.; Chen, J.; Dobos, K.M.; Bradbury, E.M.; Belisle, J.T.; Chen, X. Comprehensive proteomic profiling of the membrane constituents of a Mycobacterium tuberculosis strain. Mol. Cell Proteomics 2003, 2, 1284–1296.
[42]
Ng, S.K.; Zhang, Z.; Tan, S.H. Integrative approach for computationally inferring protein domain interactions. Bioinformatics 2003, 19, 923–929.
[43]
Walhout, A.J.; Sordella, R.; Lu, X.; Hartley, J.L.; Temple, G.F.; Brasch, M.A.; Thierry-Mieg, N.; Vidal, M. Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 2000, 287, 116–122.
[44]
IntAct Database, http://www.ebi.ac.uk/intact/main.xhtml , accessed on 6 October 2011.
[45]
Aranda, B.; Achuthan, P.; Alam-Faruque, Y.; Armean, I.; Bridge, A.; Derow, C.; Feuermann, M.; Ghanbarian, A.; Kerrien, S.; Khadake, J.; et al. The IntAct molecular interaction database in 2010. Nucleic Acids Res 2010, 38(Database issue), D525–D531.
[46]
Kerrien, S.; Aranda, B.; Breuza, L.; Bridge, A.; Broackes-Carter, F.; Chen, C.; Duesbury, M.; Dumousseau, M.; Feuermann, M.; Hinz, U.; et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res 2012, 40, D841–D846.
[47]
Integr8 Project, Available online: http://www.ebi.ac.uk/integr8 , accessed on 28 October 2011.
[48]
Pruess, M.; Kersey, P.; Apweiler, R. The Integr8 project—A resource for genomic and proteomic data. Silico Biol 2004, 5, 179–185.
[49]
Ng, S.K.; Zhang, Z.; Tan, S.H.; Lin, K. InterDom: A database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res 2003, 31, 251–254.
[50]
Pagel, P.; Oesterheld, M.; Tovstukhina, O.; Strack, N.; Stümpflen, V.; Frishman, D. DIMA 2.0—predicted and known domain interactions. Nucleic Acids Res 2008, 36, D651–D655.
[51]
TubercuList Database, Available online: http://genolist.pasteur.fr/Tuberculist , accessed on 28 October 2011.
[52]
Swets, J. Measuring the accuracy of diagnostic systems. Science 1988, 240, 1285–1293.
[53]
Swets, J.; Dawes, R.; Monahan, J. Better decisions through science. Sci. Am 2000, 283, 82–87.
[54]
Buckland, M.; Gey, F. The relationship between recall and precision. J. Am. Soc. Inf. Sci 1994, 45, 12–19.
[55]
Sing, T.; Sander, O.; Beerenwinkel, N.; Lengauer, T. ROCR: Visualizing classifier performance in R. Bioinformatics 2005, 21, 3940–3941.
[56]
R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2010.
[57]
R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2011.
[58]
Apweiler, R.; Bairoch, A.; Wu, C.H.; Barker, W.C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; et al. UniProt: The Universal Protein knowledgebase. Nucleic Acids Res 2004, 32, D115–D119.
[59]
UniProt-Consortium. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res 2010, 38, D142–D148.
[60]
Jain, E.; Bairoch, A.; Duvaud, S.; Phan, I.; Redaschi, N.; Suzek, B.E.; Martin, M.J.; McGarvey, P.; Gasteiger, E. Infrastructure for the life sciences: Design and implementation of the UniProt website. BMC Bioinforma 2009, 10, 136.
[61]
Martin, D.; Brun, C.; Remy, E.; Mouren, P.; Thieffry, D.; Jacq, B. GOToolBox: Functional analysis of gene datasets based on Gene Ontology. Genome Biol 2004, 5, doi:10.1186/gb-2004-5-12-r101.
[62]
Li, S.; Chen, H.J. A Note on the determination of sample sizes for hypergeometric distributions. Commun. Stat. Theory Methods 1999, 28, 1749–1757.