Post-translational
modification (PTM) increases the functional diversity of proteins by
introducing new functional groups to the side chain of amino acid of a protein.
Among all amino acid residues, the side chain of lysine (K) can undergo many
types of PTM, called K-PTM, such as “acetylation”, “crotonylation”, “methylation” and “succinylation” and also responsible for occurring multiple PTM in
the same lysine of a protein which leads to the requirement of multi-label PTM site
identification. However, most of the existing computational methods have been
established to predict various single-label PTM sites and a very few have been developed to solve
multi-label issue which needs further improvement. Here, we have developed a
computational tool termed mLysPTMpred to predict multi-label lysine PTM sites
by 1) incorporating the sequence-coupled information into the general pseudo
amino acid composition, 2) balancing the effect of skewed training dataset by
Different Error Cost method, and 3) constructing a multi-label predictor using
a combination of support vector machine (SVM). This predictor achieved 83.73%
accuracy in predicting the multi-label PTM site of K-PTM types. Moreover, all
the experimental results along with accuracy outperformed than the existing
predictor iPTM-mLys. A user-friendly web server of mLysPTMpred is available at http://research.ru.ac.bd/mLysPTMpred/.
References
[1]
Xu, Y., Ding, J., Wu, L.Y. and Chou, K.C. (2013) iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition. PLoS ONE, 8, e55844. https://doi.org/10.1371/journal.pone.0055844
[2]
Walsh, C.T., Garneau-Tsodikova, S. and Gatto, G.J. (2005) Protein Posttranslational Modifications: The Chemistry of Proteome Diversifications. Angewandte Chemie International Edition, 44, 7342-7372.
https://doi.org/10.1002/anie.200501023
[3]
Witze, E.S., Old, W.M., Resing, K.A. and Ahn, N.G. (2007) Mapping Protein Post-Translational Modifications with Mass Spectrometry. Nature Methods, 4, 798-806. https://doi.org/10.1038/nmeth1100
[4]
Xu, Y., Wang, Z., Li, C. and Chou, K.C. (2017) iPreny-PseAAC: Identify C-Terminal Cysteine Prenylation Sites in Proteins by Incorporating Two Tiers of Sequence Couplings into PseAAC. Medicinal Chemistry, 13, 544-551.
https://doi.org/10.2174/1573406413666170419150052
[5]
Mann, M. and Jensen, O.N. (2003) Proteomic Analysis of Post-Translational Modifications. Nature Biotechnology, 21, 255-261. https://doi.org/10.1038/nbt0303-255
[6]
Xu, Y., Wang, X., Wang, Y., Tian, Y., Shao, X., Wu, L.Y. and Deng, N. (2014) Prediction of Posttranslational Modification Sites from Amino Acid Sequences with Kernel Methods. Journal of Theoretical Biology, 344, 78-87. https://doi.org/10.1016/j.jtbi.2013.11.012
[7]
Zhang, Z., Tan, M., Xie, Z., Dai, L., Chen, Y. and Zhao, Y. (2011) Identification of Lysine Succinylation as a New Post-Translational Modification. Nature Chemical Biology, 7, 58-63. https://doi.org/10.1038/nchembio.495
[8]
Jia, J., Liu, Z., Xiao, X., Liu, B. and Chou, K.C. (2016) iSuc-PseOpt: Identifying Lysine Succinylation Sites in Proteins by Incorporating Sequence-Coupling Effects into Pseudo Components and Optimizing Imbalanced Training Dataset. Analytical Biochemistry, 497, 48-56. https://doi.org/10.1016/j.ab.2015.12.009
[9]
Qiu, W.R., Sun, B.Q., Xiao, X., Xu, Z.C. and Chou, K.C. (2016) iPTM-mLys: Identifying Multiple Lysine PTM Sites and Their Different Types. Bioinformatics, 32, 3116-3123. https://doi.org/10.1093/bioinformatics/btw380
[10]
Deng, W., Wang, Y., Ma, L., Zhang, Y., Ullah, S. and Xue, Y. (2016) Computational Prediction of Methylation Types of Covalently Modified Lysine and Arginine Residues in Proteins. Briefings in Bioinformatics, 18, 647-658. https://doi.org/10.1093/bib/bbw041
[11]
Hasan, M.M., Yang, S., Zhou, Y. and Mollah, M.N.H. (2016) SuccinSite: A Computational Tool for the Prediction of Protein Succinylation Sites by Exploiting the Amino Acid Patterns and Properties. Molecular BioSystems, 12, 786-795. https://doi.org/10.1039/C5MB00853K
[12]
Xu, Y., Ding, Y.X., Ding, J., Wu, L.Y. and Xue, Y. (2016) Mal-Lys: Prediction of Lysine Malonylation Sites in Proteins Integrated Sequence-Based Features with mRMR Feature Selection. Scientific Reports, 6, 38318.
https://doi.org/10.1038/srep38318
[13]
Jiang, M. and Cao, J.Z. (2016) Positive-Unlabeled Learning for Pupylation Sites Prediction. BioMed Research International, 2016, Article ID 4525786. https://doi.org/10.1155/2016/4525786
[14]
Wuyun, Q., Zheng, W., Zhang, Y.P., Ruan, J.S. and Hu, G. (2016) Improved Species-Specific Lysine Acetylation Site Prediction Based on a Large Variety of Features Set. PLoS ONE, 11, e0155370.
https://doi.org/10.1371/journal.pone.0155370
[15]
Jia, J., Liu, Z., Xiao, X., Liu, B. and Chou, K.C. (2016) iCar-PseCp: Identify Carbonylation Sites in Proteins by Monto Carlo Sampling and Incorporating Sequence Coupled Effects into General PseAAC. Oncotarget, 7, 34558-34570. https://doi.org/10.18632/oncotarget.9148
[16]
Qiu, W.R., Xiao, X., Lin, W.Z. and Chou, K.C. (2015) iUbiq-Lys: Prediction of Lysine Ubiquitination Sites in Proteins by Extracting Sequence Evolution Information via a Gray System Model. Journal of Biomolecular Structure and Dynamics, 33, 731-1742. https://doi.org/10.1080/07391102.2014.968875
[17]
Xiao, X., Cheng, X., Su, S., Mao, Q. and Chou, K.C. (2017) pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins. Natural Science, 9, 330. https://doi.org/10.4236/ns.2017.99032
[18]
Cheng, X., Zhao, S.G., Xiao, X. and Chou, K.C. (2017) iATC-mHyb: A Hybrid Multi-Label Classifier for Predicting the Classification of Anatomical Therapeutic Chemicals. Oncotarget, 8, 58494-58503.
https://doi.org/10.18632/oncotarget.17028
[19]
Jia, J., Liu, Z., Xiao, X., Liu, B. and Chou, K.C. (2016) iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets. Molecules, 21, 95.
https://doi.org/10.3390/molecules21010095
[20]
Xiao, X., Min, J.L., Lin, W.Z., Liu, Z., Cheng, X. and Chou, K.C. (2015) iDrug-Target: Predicting the Interactions between Drug Compounds and Target Proteins in Cellular Networking via Benchmark Dataset Optimization Approach. Journal of Biomolecular Structure and Dynamics, 33, 2221-2233.
https://doi.org/10.1080/07391102.2014.998710
[21]
Liu, Z., Xiao, X., Qiu, W.R. and Chou, K.C. (2015) iDNA-Methyl: Identifying DNA Methylation Sites via Pseudo Trinucleotide Composition. Analytical Biochemistry, 474, 69-77. https://doi.org/10.1016/j.ab.2014.12.009
[22]
Sun, Y., Wong, A.K. and Kamel, M.S. (2009) Classification of Imbalanced Data: A Review. International Journal of Pattern Recognition and Artificial Intelligence, 23, 687-719. https://doi.org/10.1142/S0218001409007326
[23]
Nath, A. and Karthikeyan, S. (2016) Enhanced Prediction and Characterization of CDK Inhibitors Using Optimal Class Distribution. Interdisciplinary Sciences: Computational Life Sciences, 9, 292-303.
[24]
Veropoulos, K., Campbell, C. and Cristianini, N. (1999) Controlling the Sensitivity of Support Vector Machines. Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, 31 July-6 August 1999, 55-60.
[25]
Hasan, M.A.M., Li, J., Ahmad, S. and Molla, M.K.I. (2017) predCar-Site: Carbonylation Sites Prediction in Proteins Using Support Vector Machine with Resolving Data Imbalanced Issue. Analytical Biochemistry, 525, 107-113. https://doi.org/10.1016/j.ab.2017.03.008
[26]
Batuwita, R. and Palade, V. (2010) Efficient Resampling Methods for Training Support Vector Machines with Imbalanced Datasets. The 2010 International Joint Conference on Neural Networks, Barcelona, 1-8.
[27]
Chou, K.C. (1993) A Vectorized Sequence-Coupling Model for Predicting HIV Protease Cleavage Sites in Proteins. Journal of Biological Chemistry, 268, 16938-16948.
[28]
Hasan, M.A.M., Ahmad, S. and Molla, M.K.I. (2017) iMulti-HumPhos: A Multi-Label Classifier for Identifying Human Phosphorylated Proteins Using Multiple Kernel Learning Based Support Vector Machine. Molecular BioSystems, 13, 1608-1618. https://doi.org/10.1039/C7MB00180K
[29]
Ju, Z. and He, J.J. (2017) Prediction of Lysine Propionylation Sites Using Biased SVM and Incorporating Four Different Sequence Features into Chou’s PseAAC. Journal of Molecular Graphics and Modelling, 76, 356-363.
https://doi.org/10.1016/j.jmgm.2017.07.022
[30]
Chen, P., Hu, S., Zhang, J., Gao, X., Li, J., Xia, J. and Wang, B. (2016) A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13, 901-912. https://doi.org/10.1109/TCBB.2015.2505286
[31]
Qiu, W.R., Zheng, Q.S., Sun, B.Q. and Xiao, X. (2016) Multi-iPPseEvo: A Multi-Label Classifier for Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into Chou’s General PseAAC via Grey System Theory. Molecular Informatics, 36, Article ID: 1600085.
[32]
Wang, X., Yan, R., Li, J. and Song, J. (2016) SOHPRED: A New Bioinformatics Tool for the Characterization and Prediction of Human S-Sulfenylation Sites. Molecular BioSystems, 12, 2849-2858.
https://doi.org/10.1039/C6MB00314A
[33]
Hu, J., Li, Y., Yang, J.Y., Shen, H.B. and Yu, D.J. (2016) GPCR-Drug Interactions Prediction Using Random Forest with Drug-Association-Matrix-Based Post-Processing Procedure. Computational Biology and Chemistry, 60, 59-71. https://doi.org/10.1016/j.compbiolchem.2015.11.007
[34]
Hu, J., Han, K., Li, Y., Yang, J.Y., Shen, H.B. and Yu, D.J. (2016) TargetCrys: Protein Crystallization Prediction by Fusing Multi-View Features with Two-Layered SVM. Amino Acids, 48, 2533-2547.
https://doi.org/10.1007/s00726-016-2274-4
[35]
Jia, J., Zhang, L., Liu, Z., Xiao, X. and Chou, K.C. (2016) pSumo-CD: Predicting Sumoylation Sites in Proteins with Covariance Discriminant Algorithm by Incorporating Sequence-Coupled Effects into General PseAAC. Bioinformatics, 32, 3133-3141. https://doi.org/10.1093/bioinformatics/btw387
[36]
Chou, K.C. (2011) Some Remarks on Protein Attribute Prediction and Pseudo Amino Acid Composition. Journal of Theoretical Biology, 273, 236-247. https://doi.org/10.1016/j.jtbi.2010.12.024
[37]
Chou, K.C. (2004) Using Amphiphilic Pseudo Amino Acid Composition to Predict Enzyme Subfamily Classes. Bioinformatics, 21, 10-19. https://doi.org/10.1093/bioinformatics/bth466
[38]
Behbahani, M., Mohabatkar, H. and Nosrati, M. (2016) Analysis and Comparison of Lignin Peroxidases between Fungi and Bacteria Using Three Different Modes of Chou’s General Pseudo Amino Acid Composition. Journal of Theoretical Biology, 411, 1-5. https://doi.org/10.1016/j.jtbi.2016.09.001
[39]
Meher, P.K., Sahu, T.K., Saini, V. and Rao, A.R. (2017) Predicting Antimicrobial Peptides with Improved Accuracy by Incorporating the Compositional, Physico-Chemical and Structural Features into Chou’s General PseAAC. Scientific Reports, 7, Article No. 42362. https://doi.org/10.1038/srep42362
[40]
Liu, B., Liu, F., Wang, X., Chen, J., Fang, L. and Chou, K.C. (2015) Pse-in-One: A Web Server for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. Nucleic Acids Research, 43, W65-W71. https://doi.org/10.1093/nar/gkv458
[41]
Liu, B. and Wu, H. (2017) Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various m Odes of Pseudo Components of DNA, RNA, and Protein Sequences. Natural Science, 9, 67-91.
https://doi.org/10.4236/ns.2017.94007
[42]
Chou, K.C. (1996) Prediction of Human Immunodeficiency Virus Protease Cleavage Sites in Proteins. Analytical Biochemistry, 233, 1-14. https://doi.org/10.1006/abio.1996.0001
[43]
Vapnik, V.N. (1999) The Nature of Statistical Learning Theory. Second Edition, Springer, New York.
[44]
Scholkopf, B. and Smola, A.J. (2001) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge.
[45]
Hasan, M.A.M., Ahmad, S. and Molla, M.K.I. (2017) Protein Subcellular Localization Prediction Using Multiple Kernel Learning Based Support Vector Machine. Molecular BioSystems, 13, 785-795.
https://doi.org/10.1039/C6MB00860G
[46]
Ju, Z., Cao, J.Z. and Gu, H. (2016) Predicting Lysine Phosphoglycerylation with Fuzzy SVM by Incorporating k-Spaced Amino Acid Pairs into Chou’s General PseAAC. Journal of Theoretical Biology, 397, 145-150.
https://doi.org/10.1016/j.jtbi.2016.02.020
[47]
Chou, K.C. (2013) Some Remarks on Predicting Multi-Label Attributes in Molecular Biosystems. Molecular Biosystems, 9, 1092-1100. https://doi.org/10.1039/c3mb25555g
[48]
Xu, Y., Ding, Y.X., Deng, N.Y. and Liu, L.M. (2016) Prediction of Sumoylation Sites in Proteins Using Linear Discriminant Analysis. Gene, 576, 99-104. https://doi.org/10.1016/j.gene.2015.09.072
[49]
Liu, B., Liu, Y., Jin, X., Wang, X. and Liu, B. (2016) iRSpot-DACC: A Computational Predictor for Recombination Hot/Cold Spots Identification Based on Dinucleotide-Based Auto-Cross Covariance. Scientific Reports, 6, Article No. 33483. https://doi.org/10.1038/srep33483
[50]
Liao, Z., Ju, Y. and Zou, Q. (2016) Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest. Scientifica, 2016, Article ID: 8309253. https://doi.org/10.1155/2016/8309253
[51]
Lin, W.Z., Fang, J.A., Xiao, X. and Chou, K.C. (2013) iLoc-Animal: A Multi-Label Learning Classifier for Predicting Subcellular Localization of Animal Proteins. Molecular BioSystems, 9, 634-644.
https://doi.org/10.1039/c3mb25466f
[52]
Huang, C. and Yuan, J.Q. (2013) A Multilabel Model Based on Chou’s Pseudo-Amino Acid Composition for Identifying Membrane Proteins with both Single and Multiple Functional Types. The Journal of Membrane Biology, 246, 327-334. https://doi.org/10.1007/s00232-013-9536-9
[53]
Xiao, X., Wang, P., Lin, W.Z., Jia, J.H. and Chou, K.C. (2013) iAMP-2L: A Two-Level Multi-Label Classifier for Identifying Antimicrobial Peptides and Their Functional Types. Analytical Biochemistry, 436, 168-177.
https://doi.org/10.1016/j.ab.2013.01.019
[54]
Hasan, M.A.M., Ahmad, S. and Molla, M.K.I. (2017) Protein Subcellular Localization Prediction Using Support Vector Machine with the Choice of Proper Kernel. BioTechnologia, 98, 85-96.
https://doi.org/10.5114/bta.2017.68307
[55]
Xu, H., Zhou, J., Lin, S., Deng, W., Ying, Z. and Yu, X. (2017) PLMD: An Updated Data Resource of Protein Lysine Modifications. Journal of Genetics & Genomics, 44, 243-250. https://doi.org/10.1016/j.jgg.2017.03.007