The ATP binding proteins exist as a hybrid of proteins with Walker A motif and universal stress proteins (USPs) having an alternative motif for binding ATP. There is an urgent need to find a reliable and comprehensive hybrid predictor for ATP binding proteins using whole sequence information. In this paper the open source LIBSVM toolbox was used to build a classifier at 10-fold cross-validation. The best hybrid model was the combination of amino acid and dipeptide composition with an accuracy of 84.57% and Mathews Correlation Coefficient (MCC) value of 0.693. This classifier proves to be better than many classical ATP binding protein predictors. The general trend observed is that combinations of descriptors performed better and improved the overall performances of individual descriptors, particularly when combined with amino acid composition. The work developed a comprehensive model for predicting ATP binding proteins irrespective of their functional motifs. This model provides a high probability of success for molecular biologists in predicting and selecting diverse groups of ATP binding proteins irrespective of their functional motifs. 1. Introduction Recent advances in the next generation sequencing and human genome projects have resulted in rapid increase of protein sequences, thus widening the protein sequence-structure gap [1, 2], leading to diverse protein functions from common family. Computation prediction tools for predicting protein structure and function are highly needed to narrow the widening gap [3]. The ATP binding proteins (ATP-BPs) are a diverse family of proteins in terms of amino acid sequences, function, and their three-dimensional structures. These proteins hydrolyze ATP to provide the energy necessary to drive biochemical reactions in the cell [4]. There are two distinct functional groups of ATP binding proteins. The first functional group has the Walker A motif [GXXXXGK (T/S) or G-4X-GK (T/S)] in their sequences for ATP binding [5]. Many members are transmembrane proteins and are responsible for transporting a wide variety of substrates across extra- and intracellular membranes [6]. The biochemical functions of ATP binding proteins are well exhibited within the ABC transporters group. In bacteria cell, ABC transporters pump substances such as sugars, vitamins, and metal ions into the cell, while in eukaryotes they transport molecules out of the cell [7]. They are also known to transport lipids and play a protective role to the developing fetus against xenobiotics [7]. ABC transporters are crucial in the development of multidrug
References
[1]
A. Bairoch and R. Apweiler, “The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000,” Nucleic Acids Research, vol. 28, no. 1, pp. 45–48, 2000.
[2]
H. M. Berman, J. Westbrook, Z. Feng et al., “The protein data bank,” Nucleic Acids Research, vol. 28, no. 1, pp. 235–242, 2000.
[3]
J. Guo, H. Chen, Z. Sun, and Y. Lin, “A novel method for protein secondary structure prediction using dual-layer SVM and profiles,” Proteins, vol. 54, no. 4, pp. 738–743, 2004.
[4]
C. Bustamante, Y. R. Chemla, N. R. Forde, and D. Izhaky, “Mechanical processes in biochemistry,” Annual Review of Biochemistry, vol. 73, pp. 705–748, 2004.
[5]
J. E. Walker, M. Saraste, M. J. Runswick, and N. J. Gay, “Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold,” The EMBO Journal, vol. 1, no. 8, pp. 945–951, 1982.
[6]
N. Hirokawa and R. Takemura, “Biochemical and molecular characterization of diseases linked to motor proteins,” Trends in Biochemical Sciences, vol. 28, no. 10, pp. 558–565, 2003.
[7]
C. Gedeon, J. Behravan, G. Koren, and M. Piquette-Miller, “Transport of glyburide by placental ABC transporters: implications in fetal drug exposure,” Placenta, vol. 27, no. 11-12, pp. 1096–1102, 2006.
[8]
A. Maxwell and D. M. Lawson, “The ATP-binding site of type II topoisomerases as a target for antibacterial drugs,” Current Topics in Medicinal Chemistry, vol. 3, no. 3, pp. 283–303, 2003.
[9]
H. Ashida, T. Oonishi, and N. Uyesaka, “Kinetic analysis of the mechanism of action of the multidrug transporter,” Journal of Theoretical Biology, vol. 195, no. 2, pp. 219–232, 1998.
[10]
K. Kvint, L. Nachin, A. Diez, and T. Nystrom, “The bacterial universal stress protein: function and regulation,” Current Opinion in Microbiology, vol. 6, no. 2, pp. 140–145, 2003.
[11]
T. Nystrom and F. C. Neidhardt, “Cloning, mapping and nucleotide sequencing of a gene encoding a universal stress protein in Escherichia coli,” Molecular Microbiology, vol. 6, no. 21, pp. 3187–3198, 1992.
[12]
A. Diez, N. Gustavsson, and T. Nystrom, “The universal stress protein a of Escherichia coli is required for resistance to DNA damaging agents and is regulated by a RecA/FtsK-dependent regulatory pathway,” Molecular Microbiology, vol. 36, no. 6, pp. 1494–1503, 2000.
[13]
M. C. Sousa and D. B. Mckay, “Structure of the universal stress protein of Haemophilus influenzae,” Structure, vol. 9, no. 12, pp. 1135–1141, 2001.
[14]
V. J. Promponas, C. A. Ouzounis, and I. Iliopoulos, “Experimental evidence validating the computational inference of functional associations from gene fusion events: a critical survey,” Briefings in Bioinformatics, 2012.
[15]
J. S. Chauhan, N. K. Mishra, and G. P. Raghava, “Identification of ATP binding residues of a protein from its primary sequence,” BMC Bioinformatics, vol. 10, article 434, 2009.
[16]
T. Guo, Y. Shi, and Z. Sun, “A novel statistical ligand-binding site predictor: application to ATP-binding sites,” Protein Engineering, Design and Selection, vol. 18, no. 2, pp. 65–70, 2005.
[17]
K. Chen, M. J. Mizianty, and L. Kurgan, “ATPsite: sequence-based prediction of ATP-binding residues,” Proteome Science, vol. 9, article S4, supplement 1, 2011.
[18]
Y. N. Zhang, D. J. Yu, S. S. Li, Y. X. Fan, Y. Huang, and H. B. Shen, “Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features,” BMC Bioinformatics, vol. 13, article 118, 2012.
[19]
J. R. Green, M. J. Korenberg, R. David, and I. W. Hunter, “Recognition of adenosine triphosphate binding sites using parallel cascade system identification,” Annals of Biomedical Engineering, vol. 31, no. 4, pp. 462–470, 2003.
[20]
A. Garg, M. Bhasin, and G. P. S. Raghava, “Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search,” The Journal of Biological Chemistry, vol. 280, no. 15, pp. 14427–14432, 2005.
[21]
S. Ahmad, M. M. Gromiha, and A. Sarai, “Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information,” Bioinformatics, vol. 20, no. 4, pp. 477–486, 2004.
[22]
X. Xiao, P. Wang, and K.-C. Chou, “GPCR-CA: a cellular automaton image approach for predicting G-protein-coupled receptor functional classes,” Journal of Computational Chemistry, vol. 30, no. 9, pp. 1414–1423, 2009.
[23]
M. Kumar, M. M. Gromiha, and G. P. S. Raghava, “Prediction of RNA binding sites in a protein using SVM and PSSM profile,” Proteins, vol. 71, no. 1, pp. 189–194, 2008.
[24]
B. S. Williams, R. D. Isokpehi, A. N. Mbah et al., “Functional annotation analytics of bacillus genomes reveals stress responsive acetate utilization and sulfate uptake in the biotechnologically relevant bacillus megaterium,” Bioinformatics and Biology Insights, vol. 6, pp. 275–286, 2012.
[25]
R. D. Isokpehi, O. Mahmud, A. N. Mbah et al., “Developmental regulation of genes encoding universal stress proteins in Schistosoma mansoni,” Gene Regulation and Systems Biology, vol. 5, pp. 61–74, 2011.
[26]
A. N. Mbah, O. Mahmud, O. R. Awofolu, and R. D. Isokpehi, “Inferences on the biochemical and environmental regulation of universal stress proteins from Schistosomiasis parasites,” Advances and Applications in Bioinformatics and Chemistry, vol. 6, pp. 15–27, 2013.
[27]
W. Li, L. Jaroszewski, and A. Godzik, “Clustering of highly homologous sequences to reduce the size of large protein databases,” Bioinformatics, vol. 17, no. 3, pp. 282–283, 2001.
[28]
G. Wang and R. L. Dunbrack Jr., “PISCES: a protein sequence culling server,” Bioinformatics, vol. 19, no. 12, pp. 1589–1591, 2003.
[29]
X. Yu, J. Cao, Y. Cai, T. Shi, and Y. Li, “Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines,” Journal of Theoretical Biology, vol. 240, no. 2, pp. 175–184, 2006.
[30]
A. Marchler-Bauer, C. Zheng, F. Chitsaz et al., “CDD: conserved domains and protein three-dimensional structure,” Nucleic Acids Research, vol. 41, pp. D348–D352, 2013.
[31]
Z. R. Li, H. H. Lin, L. Y. Han, L. Jiang, X. Chen, and Y. Z. Chen, “PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence,” Nucleic Acids Research, vol. 34, pp. W32–W37, 2006.
[32]
Z. Bikadi, I. Hazai, D. Malik et al., “Predicting P-glycoprotein-mediated drug transport based on support vector machine and three-dimensional crystal structure of P-glycoprotein,” PLoS ONE, vol. 6, no. 10, Article ID e25815, 2011.
[33]
S. L. Lo, C. Z. Cai, Y. Z. Chen, and M. C. M. Chung, “Effect of training datasets on support vector machine prediction of protein-protein interactions,” Proteomics, vol. 5, no. 4, pp. 876–884, 2005.
[34]
M. P. Brown, W. N. Grundy, D. Lin et al., “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proceedings of the National Academy of Sciences of the United States of America, vol. 97, no. 1, pp. 262–267, 2000.
[35]
T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, vol. 16, no. 10, pp. 906–914, 2000.
[36]
K.-C. Chou and Y.-D. Cai, “Predicting protein-protein interactions from sequences in a hybridization space,” Journal of Proteome Research, vol. 5, no. 2, pp. 316–322, 2006.
[37]
M. E. Matheny, F. S. Resnic, N. Arora, and L. Ohno-Machado, “Effects of SVM parameter optimization on discrimination and calibration for post-procedural PCI mortality,” Journal of Biomedical Informatics, vol. 40, no. 6, pp. 688–697, 2007.
[38]
F. Javed, G. S. Chan, A. V. Savkin et al., “RBF kernel based support vector regression to estimate the blood volume and heart rate responses during hemodialysis,” in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC '09), pp. 4352–4355, 2009.
[39]
C.-C. Chang and C.-J. Lin, “Training nu-support vector classifiers: theory and algorithms,” Neural Computation, vol. 13, no. 9, pp. 2119–2147, 2001.
[40]
V. Cherkassky and Y. Ma, “Practical selection of SVM parameters and noise estimation for SVM regression,” Neural Networks, vol. 17, no. 1, pp. 113–126, 2004.
[41]
K. C. Chou and C. T. Zhang, “Prediction of protein structural classes,” Critical Reviews in Biochemistry and Molecular Biology, vol. 30, pp. 275–349, 1995.
[42]
C. Chen, L. Chen, X. Zou, and P. Cai, “Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine,” Protein and Peptide Letters, vol. 16, no. 1, pp. 27–31, 2009.
[43]
H. Ding, L. Luo, and H. Lin, “Prediction of cell wall lytic enzymes using chou's amphiphilic pseudo amino acid composition,” Protein and Peptide Letters, vol. 16, no. 4, pp. 351–355, 2009.
[44]
J. Bondia, C. Tarin, W. Garcia-Gabin, et al., “Using support vector machines to detect therapeutically incorrect measurements by the MiniMed CGMS,” Journal of Diabetes Science and Technology, vol. 2, pp. 622–629, 2008.
[45]
S. Chen, S. Zhou, F.-F. Yin, L. B. Marks, and S. K. Das, “Investigation of the support vector machine algorithm to predict lung radiation-induced pneumonitis,” Medical Physics, vol. 34, no. 10, pp. 3808–3814, 2007.
[46]
B. W. Matthews, “Comparison of the predicted and observed secondary structure of T4 phage lysozyme,” Biochimica et Biophysica Acta, vol. 405, no. 2, pp. 442–451, 1975.
[47]
L. Bao and Y. Cui, “Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information,” Bioinformatics, vol. 21, no. 10, pp. 2185–2190, 2005.
[48]
R. J. Dobson, P. B. Munroe, M. J. Caulfield, and M. A. S. Saqi, “Predicting deleterious nsSNPs: an analysis of sequence and structural attributes,” BMC Bioinformatics, vol. 7, article 217, 2006.
[49]
J. A. Hanley and B. J. Mcneil, “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, vol. 143, no. 1, pp. 29–36, 1982.
[50]
E. R. Delong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics, vol. 44, no. 3, pp. 837–845, 1988.
[51]
C. Chothia and A. M. Lesk, “The relation between the divergence of sequence and structure in proteins,” The EMBO Journal, vol. 5, no. 4, pp. 823–826, 1986.
[52]
A. M. Lesk and C. Chothia, “How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins,” Journal of Molecular Biology, vol. 136, no. 3, pp. 225–270, 1980.
[53]
M. Hilbert, G. Bohm, and R. Jaenicke, “Structural relationships of homologous proteins as a fundamental principle in homology modeling,” Proteins, vol. 17, no. 2, pp. 138–151, 1993.
[54]
P. I. Hanson and S. W. Whiteheart, “AAA+ proteins: have engine, will work,” Nature Reviews Molecular Cell Biology, vol. 6, no. 7, pp. 519–529, 2005.
[55]
K. M. Ferguson, T. Higashijima, M. D. Smigel, and A. G. Gilman, “The influence of bound GDP on the kinetics of guanine nucleotide binding to G proteins,” The Journal of Biological Chemistry, vol. 261, no. 16, pp. 7393–7399, 1986.
[56]
F. Jurnak, A. Mcpherson, A. H. J. Wang, and A. Rich, “Biochemical and structural studies of the tetragonal crystalline modification of the Escherichia coli elongation factor Tu,” The Journal of Biological Chemistry, vol. 255, no. 14, pp. 6751–6757, 1980.
[57]
T. I. Zarembinski, L. I.-W. Hung, H.-J. Mueller-Dieckmann et al., “Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics,” Proceedings of the National Academy of Sciences of the United States of America, vol. 95, no. 26, pp. 15189–15193, 1998.
[58]
M. Saito, M. Go, and T. Shirai, “An empirical approach for detecting nucleotide-binding sites on proteins,” Protein Engineering, Design and Selection, vol. 19, no. 2, pp. 67–75, 2006.
[59]
V. Sobolev, A. Sorokine, J. Prilusky, E. E. Abola, and M. Edelman, “Automated analysis of interatomic contacts in proteins,” Bioinformatics, vol. 15, no. 4, pp. 327–332, 1999.
[60]
R. E. Schapire and Y. Singer, “Boostexter: a boosting-based system for text categorization,” Machine Learning, vol. 39, no. 2-3, pp. 135–168, 2000.
[61]
S. A. Ong, H. H. Lin, Y. Z. Chen, Z. R. Li, and Z. Cao, “Efficacy of different protein descriptors in predicting protein functional families,” BMC Bioinformatics, vol. 8, article 300, 2007.
[62]
L. Xue and J. Bajorath, “Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening,” Combinatorial Chemistry and High Throughput Screening, vol. 3, no. 5, pp. 363–372, 2000.
[63]
L. Xue, J. W. Godden, and J. Bajorath, “Evaluation of descriptors and mini-fingerprints for the identification of molecules with similar activity,” Journal of Chemical Information and Computer Sciences, vol. 40, no. 5, pp. 1227–1234, 2000.