Several efficient algorithms to conduct pairwise comparisons among large databases of protein structures have emerged in the recent literature. The central theme is the design of a measure between the Cα atoms of two protein chains, from which dynamic programming is used to compute an alignment. The efficiency and efficacy of these algorithms allows large-scale computational studies that would have been previously impractical. The computational study herein shows that the structural alignment algorithm eigen-decomposition alignment with the spectrum (EIGAs) is robust against both parametric and structural variation.
References
[1]
Andonov, R.; Malod-Dognin, N.; Yanev, N. Maximum contact map overlap revisited. J. Comput. Biol. 2011, 18, 27–41, doi:10.1089/cmb.2009.0196. 21210730
[2]
Andonov, R.; Yanev, N.; Malod-Dognin, N. An Efficient Lagrangian Relaxation for the Contact Map Overlap Problem. Proceedings of the 8th International Workshop on Algorithms in Bioinformatics, Karlsruhe, Germany, 15–19 September 2008; Springer-Verlag: Berlin/Heidelberg, Germany, 2008; pp. 162–173.
[3]
Hasegawa, H.; Holm, L. Advances and pitfalls of protein structural alignment. Curr. Opin. Struct. Biol. 2009, 19, 341–348. 19481444
[4]
Li, S.C.; Ng, Y.K. On protein structure alignment under distance constraint. Theor. Comput. Sci. 2011, 412, 4187–4199, doi:10.1016/j.tcs.2010.11.045.
[5]
Menke, M.; Berger, B.; Cowen, L. Matt: Local flexibility aids protein multiple structure alignment. PLoS Comput. Biol. 2008, 4, e10, doi:10.1371/journal.pcbi.0040010. 18193941
[6]
Poleksic, A. Algorithms for optimal protein structure alignment. Bioinformatics 2009, 25, 2751–2756, doi:10.1093/bioinformatics/btp530. 19734152
[7]
Prlic, A.; Bliven, S.; Rose, P.W.; Bluhm, W.F.; Bizon, C.; Godzik, A.; Bourne, P.E. Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics 2010, 26, 2983–2985, doi:10.1093/bioinformatics/btq572. 20937596
[8]
Shibberu, Y.; Holder, A. A spectral approach to protein structure alignment. IEEE/ACM Trans. Comput. Biol. Bioinform. 2011, 8, 867–875, doi:10.1109/TCBB.2011.24. 21301031
[9]
Bonnel, N.; Mareau, P. LNA: Fast Protein Classification Using A Laplacian Characterization of Tertiary Structure.; Technical Report for IRISA: Vannes, France, 2012.
[10]
Kifer, I.; Nussinov, R.; Wolfson, H.J. GOSSIP: A method for fast and accurate global alignment of protein structure. Bioinformatics 2011, 27, 925–932, doi:10.1093/bioinformatics/btr044. 21296751
[11]
Bhattacharya, S.; Bhattacharyya, C.; Chandra, N.R. Projections for fast protein structure retrieval. BMC Bioinform. 2006, 7 Suppl. 5, S5.
[12]
Lena, P.D.; Fariselli, P.; Margara, L.; Vassura, M.; Casadio, R. Fast overlapping of protein contact maps by alignment of eigenvectors. Bioinformatics 2010, 26, 2250–2258, doi:10.1093/bioinformatics/btq402. 20610612
[13]
Liu, W.; Srivastava, A.; Zhang, J. A mathematical framework for protein structure comparison. PLoS Comput. Biol. 2011, 7, e1001075, doi:10.1371/journal.pcbi.1001075. 21304929
[14]
Lu, Z.; Zhao, Z.; Fu, B. Efficient protein alignment algorithm for protein search. BMC Bioinform. 2010, 11 Suppl. 1, S34, doi:10.1186/1471-2105-11-S1-S34.
[15]
Mavridis, L.; Ritchie, D.W. 3d-blast: 3d protein structure alignment, comparison, and classification using spherical polar fourier correlations. Pac Symp. Biocomput. 2010, doi:10.1142/9789814295291 0030.
[16]
Mirceva, G.; Cingovska, I.; Dimov, Z.; Davcev, D. Efficient approaches for retrieving protein tertiary structures. IEEE/ACM Trans. Comput. Biol. Bioinform. 2012, 9, 1166–1179, doi:10.1109/TCBB.2011.138. 22025763
[17]
Novosd, T.; Snel, V.; Abraham, A.; Yang, J.Y. Searching protein 3-D structures for optimal structure alignment using intelligent algorithms and data structures. IEEE Trans. Inf. Technol. Biomed. 2010, 14, 1378–1386, doi:10.1109/TITB.2010.2079939. 20876026
[18]
Poleksic, A. Optimal pairwise alignment of fixed protein structures in subquadratic time. J. Bioinform. Comput. Biol. 2011, 9, 367–382, doi:10.1142/S0219720011005562. 21714130
[19]
Shibuya, T.; Jansson, J.; Sadakane, K. Linear-time protein 3-D structure searching with insertions and deletions. Algorithms Mol. Biol. 2010, 5, 7, doi:10.1186/1748-7188-5-7. 20047663
[20]
Holm, L.; Sander, C. Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 1993, 233, 123–138, doi:10.1006/jmbi.1993.1489. 8377180
[21]
Shindyalov, I.N.; Bourne, P.E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998, 11, 739–747, doi:10.1093/protein/11.9.739. 9796821
[22]
Zhang, Y.; Skolnick, J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005, 33, 2302–2309, doi:10.1093/nar/gki524. 15849316
[23]
Orengo, C.A.; Taylor, W.R. SSAP: Sequential structure alignment program for protein structure comparison. Meth. Enzymol. 1996, 266, 617–635. 8743709
[24]
Pang, B.; Zhao, N.; Becchi, M.; Korkin, D.; Shyu, C.R. Accelerating large-scale protein structure alignments with graphics processing units. BMC Res. Notes 2012, 5, 116, doi:10.1186/1756-0500-5-116. 22357132
[25]
Holm, L.; Kriinen, S.; Rosenstrm, P.; Schenkel, A. Searching protein structure databases with DaliLite v.3. Bioinformatics 2008, 24, 2780–2781, doi:10.1093/bioinformatics/btn507. 18818215
[26]
Pascual-Garca, A.; Abia, D.; Ortiz, A.R.; Bastolla, U. Cross-over between discrete and continuous protein structure space: Insights into automatic classification and networks of protein structures. PLoS Comput. Biol. 2009, 5, e1000331, doi:10.1371/journal.pcbi.1000331. 19325884
[27]
Redfern, O.C.; Harrison, A.; Dallman, T.; Pearl, F.M.G.; Orengo, C.A. CATHEDRAL: A fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Comput. Biol. 2007, 3, e232, doi:10.1371/journal.pcbi.0030232. 18052539
[28]
Budowski-Tal, I.; Nov, Y.; Kolodny, R. FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately. Proc. Natl. Acad. Sci. USA 2010, 107, 3481–3486, doi:10.1073/pnas.0914097107. 20133727
[29]
Bellman, R. Dynamic Programming; Princeton University Press: Princeton, NJ, USA, 1957.
[30]
Needleman, S.B.; Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970, 48, 443–453, doi:10.1016/0022-2836(70)90057-4. 5420325
[31]
Goonesekere, N.C.W.; Lee, B. Frequency of gaps observed in a structurally aligned protein pair database suggests a simple gap penalty function. Nucleic Acids Res. 2004, 32, 2838–2843, doi:10.1093/nar/gkh610. 15155852
[32]
Madhusudhan, M.S.; Marti-Renom, M.A.; Sanchez, R.; Sali, A. Variable gap penalty for protein sequence-structure alignment. Protein Eng. Des. Sel. 2006, 19, 129–133, doi:10.1093/protein/gzj005. 16423846
Salem, S.; Zaki, M.J.; Bystroff, C. FlexSnap: Flexible non-sequential protein structure alignment. Algorithms Mol. Biol. 2010, 5, 12, doi:10.1186/1748-7188-5-12. 20047669
[35]
Schmidt-Goenner, T.; Guerler, A.; Kolbeck, B.; Knapp, E.W. Circular permuted proteins in the universe of protein folds. Proteins 2010, 78, 1618–1630, doi:10.1002/prot.22678. 20112421
[36]
Poleksic, A. On complexity of protein structure alignment problem under distance constraint. IEEE/ACM Trans. Comput. Biol. Bioinform. 2012, 9, 511–516, doi:10.1109/TCBB.2011.133. 22025757
[37]
Shibberu, Y.; Holder, A.; Lutz, K. Fast Protein Structure Alignment; LNCS (LNBI).; Springer-Verlag: Berlin/Heidelberg, Germany, 2010; Volume 6053, pp. 152–165.
[38]
Homepage of SWIG. Available online: www.swig.org (accessed on 22 October 2013).
[39]
Cock, P.J.A.; Antao, T.; Chang, J.T.; Chapman, B.A.; Cox, C.J.; Dalke, A.; Friedberg, I.; Hamelryck, T.; Kauff, F.; Wilczynski, B.; de Hoon, MJ.L. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009, 25, 1422–1423, doi:10.1093/bioinformatics/btp163. 19304878
[40]
Andreeva, A.; Howorth, D.; Chandonia, J.M.; Brenner, S.E.; Hubbard, T.J.P.; Chothia, C.; Murzin, A.G. Data growth and its impact on the SCOP database: New developments. Nucleic Acids Res. 2008, 36, D419–D425. 18000004
[41]
Andreeva, A.; Murzin, A.G. Structural classification of proteins and structural genomics: New insights into protein folding and evolution. Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun. 2010, 66, 1190–1197, doi:10.1107/S1744309110007177. 20944210
[42]
Conte, L.L.; Ailey, B.; Hubbard, T.J.; Brenner, S.E.; Murzin, A.G.; Chothia, C. SCOP: A structural classification of proteins database. Nucleic Acids Res. 2000, 28, 257–259, doi:10.1093/nar/28.1.257. 10592240
[43]
Ritchie, D.W.; Ghoorah, A.W.; Mavridis, L.; Venkatraman, V. Fast protein structure alignment using Gaussian overlap scoring of backbone peptide fragment similarity. Bioinformatics 2012, 28, 3274–3281, doi:10.1093/bioinformatics/bts618. 23093609
[44]
Delano, W.L. The PyMOL Molecular Graphics System; DeLano Scientific: San Carlos, CA, USA, 2002.
[45]
The chain d1fvqa, which is copper-transporting ATPase, has a single model and no B-factor. Hence, this chain was constant throughout the study.