The fundamental aim of protein classification is to recognize the family of a given protein and determine its biological function. In the literature, the most common approaches are based on sequence or structure similarity comparisons. Other methods use evolutionary distances between proteins. In order to increase classification performance, this work proposes a novel method, namely Consensus, which combines the decisions of several sequence and structure comparison tools to classify a given structure. Additionally, Consensus uses the evolutionary information of the compared structures. Our method is tested on three databases and evaluated based on different criteria. Performance evaluation of our method shows that it outperforms the different classifiers used separately and gives higher classification perfor-mance than a free-alignment method, namely ProtClass.
R. Parasuram, J. S. Lee, P. Yin, S. Somarowthu and M. J. Ondrechen, “Functional Classification of Protein 3D Structures from Predicted Local Interaction Sites,” Journal of Bioinformatics and Computational Biology, Vol. 8, No. 1, 2010, pp. 1-15.
Y. Y. Tseng and W. H. Li, “Classification of Protein Functional Surfaces Using Structural Characteristics,” Proceedings of the National Academy of Science, Vol. 109, No. 4, 2012, pp. 1170-1175.
J. Lundstrom, L. Rychlewski, J. Bujnicki and A. Elofsson, “Pcons: A Neural Network-Based Consensus Predictor That Improves Fold Recognition,” Protein Science, Vol. 10, No. 11, 2001, pp. 2354-2362. doi:10.1110/ps.08501
A. G. Murzin, S. E. Brenner, T. Hubbard and C. Chothia, “SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,” Journal of Molecular Biology, Vol. 247, No. 4, 1995, pp. 536540. doi:10.1016/S0022-2836(05)80134-2
O. ?amo?lu, T. Can, A. K. Singh and Y. F. Wang, “Decision Tree Based Information Integration for Automated Protein Classification,” Journal of Bioinformatics and Computational Biology, Vol. 3, No. 3, 2005, pp. 717-742.
I. Melvin, E. Ie, R. Kuang, J. Weston, W. N. Stafford and C. Leslie, “SVMfold: A Tool for Discriminative MultiClass Protein Fold and Superfamily Recognition,” BMC Bioinformatics, Vol. 8, No. 4, 2007, p. S2.
C. A. Orengo, A. D. Michie, S. Jones, D. T. Jones, M. B. Swindells and J. M. Thornton, “CATH—A Hierarchic Classification of Protein Domain Structures,” Structure, Vol. 5, No. 8, 1997, pp. 1093-1108.
K. Boujenfa, N. Essoussi and M. Limam, “Tree-kNN: A Tree-Based Algorithm for Protein Sequence Classification,” International Journal on Computer Science and Engineering, Vol. 3, No. 2, 2011, pp. 961-968.
S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman, “Gapped Blast and Psi-Blast: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, Vol. 25, No. 17, 1997, pp. 3389-3402. doi:10.1093/nar/25.17.3389
P. Sonego, M. Pacurar, S. Dhir, A. Kertesz-Farkas, A. Kocsor, Z. Gaspari, J. A. M. Leunissen and S. Pongor, “A Protein Classification Benchmark Collection for Machine Learning,” Nucleic Acids Research, Vol. 35, No. 1, 2007, pp. D232-D236. doi:10.1093/nar/gkl812
J. Pleiss, M. Fischer, M. Peiker, C. Thiele and R. D. Schmid, “Lipase Engineering Database—Understanding and Exploiting Sequence-Structure-Function Relationships,” Journal of Molecular Catalysis B-Enzymatic, Vol. 10, No. 5, 2000, pp. 491-508.