A number of tools for the alignment of protein-protein interaction (PPI) networks have laid the foundation for PPI network analysis. Most of alignment tools focus on finding conserved interaction regions across the PPI networks through either local or global mapping of similar sequences. Researchers are still trying to improve the speed, scalability, and accuracy of network alignment. In view of this, we introduce a connected-components based fast algorithm, HopeMap, for network alignment. Observing that the size of true orthologs across species is small comparing to the total number of proteins in all species, we take a different approach based on a precompiled list of homologs identified by KO terms. Applying this approach to S. cerevisiae (yeast) and D. melanogaster (fly), E. coli K12 and S. typhimurium, E. coli K12 and C. crescenttus, we analyze all clusters identified in the alignment. The results are evaluated through up-to-date known gene annotations, gene ontology (GO), and KEGG ortholog groups (KO). Comparing to existing tools, our approach is fast with linear computational cost, highly accurate in terms of KO and GO terms specificity and sensitivity, and can be extended to multiple alignments easily. 1. Introduction Protein-protein interactions (PPI) are of central importance for virtually every process in a living cell. For example, information about these interactions improves our understanding of diseases and can provide the basis for new therapeutic approaches [1]. One of fundamental goals of system biology is to understand how proteins in the cell interact with each other. However, finding all protein interactions is costly and labor intensive. For example, to find all pairwise interactions for a species with 5000 proteins, one needs to do 12497500 pairwise tests. This is one reason that current known direct interactions are incomplete. High-throughput experimental techniques (e.g., yeast two-hybrid and coimmunoprecipitation test) can be helpful in this case. Integrated probability models are also used to predict the protein-protein interactions [1, 2]. Quite a few databases, DIP [3], IntAct [4], BioGRID [5], HPRD [6], and IntPro [7], are public available for collecting and storing PPI network data. Researchers [1, 8–14] are trying to identify conserved patterns such as ortholog groups and functional similar pathways/complexes across species using PPI network data. Figure 1 provides an example of global visualization of protein interaction networks. Figure 1: global visualization of protein interaction networks (from [ 15]). The exact
References
[1]
R. Sharan, S. Suthram, R. M. Kelley et al., “Conserved patterns of protein interaction in multiple species,” Proceedings of the National Academy of Sciences of the United States of America, vol. 102, no. 6, pp. 1974–1979, 2005.
[2]
R. Sharan and T. Ideker, “Modeling cellular machinery through biological network comparison,” Nature Biotechnology, vol. 24, no. 4, pp. 427–433, 2006.
HPRD (Human Protein Reference Database), http://www.hprd.org/.
[7]
InterPro, http://www.ebi.ac.uk/interpro/.
[8]
J. Flannick, A. Novak, B. S. Srinivasan, H. H. McAdams, and S. Batzoglou, “General and robust alignment of multiple large interaction networks,” Genome Research, vol. 16, no. 9, pp. 1169–1181, 2006.
[9]
J. Flannick, A. Novak, C. B. Do, B. S. Srinivasan, and S. Batzoglou, “Automatic parameter learning for multiple local network alignment,” Journal of Computational Biology, vol. 16, no. 8, pp. 1001–1022, 2009.
[10]
M. Kalaev, V. Bafna, and R. Sharan, “Fast and accurate alignment of multiple protein networks,” Journal of Computational Biology, vol. 16, no. 8, pp. 989–999, 2009.
[11]
B. P. Kelley, R. Sharan, R. M. Karp et al., “Conserved pathways within bacteria and yeast as revealed by global protein network alignment,” Proceedings of the National Academy of Sciences of the United States of America, vol. 100, no. 20, pp. 11394–11399, 2003.
[12]
M. Koyutürk, A. Grama, and W. Szpankowski, “Pairwise local alignment of protein interaction networks guided by models of evolution,” in Proceedings of the 9th Annual International Conference on Research in Computational Molecular Biology (RECOMB '05), vol. 3500, pp. 48–65, May 2005.
[13]
R. Singh, J. Xu, and B. Berger, “Pairwise global alignment of protein interaction networks by matching neighborhood topology,” in Proceedings of the 11th Annual International Conference on Research in Computational Molecular Biology (RECOMB '07), pp. 16–31, April 2007.
[14]
R. Singh, J. Xu, and B. Berger, “Global alignment of multiple protein interaction networks,” Pacific Symposium on Biocomputing, vol. 13, pp. 303–314, 2008.
[15]
S. Bandyopadhyay, R. Sharan, and T. Ideker, “Systematic identification of functional orthologs based on protein network comparison,” Genome Research, vol. 16, no. 3, pp. 428–435, 2006.
[16]
S. Maslov and K. Sneppen, “Specificity and stability in topology of protein networks,” Science, vol. 296, no. 5569, pp. 910–913, 2002.
[17]
A. Wagner, “The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes,” Molecular Biology and Evolution, vol. 18, no. 7, pp. 1283–1292, 2001.
[18]
B. Zhang, B. H. Park, T. Karpinets, and N. F. Samatova, “From pull-down data to protein interaction networks and complexes with biological relevance,” Bioinformatics, vol. 24, no. 7, pp. 979–986, 2008.
[19]
M. Ashburner, C. A. Ball, J. A. Blake et al., “Gene ontology: tool for the unification of biology,” Nature Genetics, vol. 25, no. 1, pp. 25–29, 2000.
[20]
M. Kanehisa and S. Goto, “KEGG: Kyoto Encyclopedia of Genes and Genomes,” Nucleic Acids Research, vol. 28, no. 1, pp. 27–30, 2000.
[21]
M. R. Remm, C. E. V. Storm, and E. L. L. Sonnhammer, “Automatic clustering of orthologs and in-paralogs from pairwise species comparisons,” Journal of Molecular Biology, vol. 314, no. 5, pp. 1041–1052, 2001.
[22]
F. Towfic, M. H. W. Greenlee, and V. Honavar, Aligning Biomolecular Networks Using Modular Graph Kernels, vol. 5724 of Lecture Notes in Computer Science, 2009.
[23]
W. Chen, M. C. Schmidt, W. Tian, and N. F. Samatova, “A fast and accurate algorithm for identifying functional modules through pairwise local alignment of protein interaction,” in Proceedings of the International Conference on Bioinformatics and Computational Biology, Las Vegas, Nev, USA, July 2009.
[24]
A. Ruepp, A. Zollner, D. Maier et al., “The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes,” Nucleic Acids Research, vol. 32, no. 18, pp. 5539–5545, 2004.
[25]
H. W. Mewes, C. Amid, R. Arnold, et al., “Munich information center for protein sequences,” Nucleic Acids Research, vol. 32, pp. D41–D44, 2004.
G. X. Yu, B. H. Park, P. Chandramohan, R. Munavalli, A. Geist, and N. F. Samatova, “In silico discovery of enzyme-substrate specificity-determining residue clusters,” Journal of Molecular Biology, vol. 352, no. 5, pp. 1105–1117, 2005.
[29]
B. V. North, D. Curtis, and P. C. Sham, “A note on the calculation of empirical P values from Monte Carlo procedures,” American Journal of Human Genetics, vol. 71, no. 2, pp. 439–441, 2002.
[30]
GO termFinder, http://go.princeton.edu/cgi-bin/GOTermFinder.
[31]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, The MIT Press, 2nd edition, 2001.
W. Tian and N. F. Samatova, “Pairwise alignment of interaction networks by fast identification of maximal conserved patterns,” Pacific Symposium on Biocomputing, vol. 14, pp. 99–110, 2009.