全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

mLysPTMpred: Multiple Lysine PTM Site Prediction Using Combination of SVM with Resolving Data Imbalance Issue

DOI: 10.4236/ns.2018.109035, PP. 370-384

Keywords: Multi-Label PTM Site Predictor, Sequence-Coupling Model, General PseAAC, Data Imbalance Issue, Different Error Costs, Support Vector Machine

Full-Text   Cite this paper   Add to My Lib

Abstract:

Post-translational modification (PTM) increases the functional diversity of proteins by introducing new functional groups to the side chain of amino acid of a protein. Among all amino acid residues, the side chain of lysine (K) can undergo many types of PTM, called K-PTM, such as “acetylation”, “crotonylation”, “methylation” and “succinylation” and also responsible for occurring multiple PTM in the same lysine of a protein which leads to the requirement of multi-label PTM site identification. However, most of the existing computational methods have been established to predict various single-label PTM sites and a very few have been developed to solve multi-label issue which needs further improvement. Here, we have developed a computational tool termed mLysPTMpred to predict multi-label lysine PTM sites by 1) incorporating the sequence-coupled information into the general pseudo amino acid composition, 2) balancing the effect of skewed training dataset by Different Error Cost method, and 3) constructing a multi-label predictor using a combination of support vector machine (SVM). This predictor achieved 83.73% accuracy in predicting the multi-label PTM site of K-PTM types. Moreover, all the experimental results along with accuracy outperformed than the existing predictor iPTM-mLys. A user-friendly web server of mLysPTMpred is available at http://research.ru.ac.bd/mLysPTMpred/.

References

[1]  Xu, Y., Ding, J., Wu, L.Y. and Chou, K.C. (2013) iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition. PLoS ONE, 8, e55844.
https://doi.org/10.1371/journal.pone.0055844
[2]  Walsh, C.T., Garneau-Tsodikova, S. and Gatto, G.J. (2005) Protein Posttranslational Modifications: The Chemistry of Proteome Diversifications. Angewandte Chemie International Edition, 44, 7342-7372.
https://doi.org/10.1002/anie.200501023
[3]  Witze, E.S., Old, W.M., Resing, K.A. and Ahn, N.G. (2007) Mapping Protein Post-Translational Modifications with Mass Spectrometry. Nature Methods, 4, 798-806.
https://doi.org/10.1038/nmeth1100
[4]  Xu, Y., Wang, Z., Li, C. and Chou, K.C. (2017) iPreny-PseAAC: Identify C-Terminal Cysteine Prenylation Sites in Proteins by Incorporating Two Tiers of Sequence Couplings into PseAAC. Medicinal Chemistry, 13, 544-551.
https://doi.org/10.2174/1573406413666170419150052
[5]  Mann, M. and Jensen, O.N. (2003) Proteomic Analysis of Post-Translational Modifications. Nature Biotechnology, 21, 255-261.
https://doi.org/10.1038/nbt0303-255
[6]  Xu, Y., Wang, X., Wang, Y., Tian, Y., Shao, X., Wu, L.Y. and Deng, N. (2014) Prediction of Posttranslational Modification Sites from Amino Acid Sequences with Kernel Methods. Journal of Theoretical Biology, 344, 78-87.
https://doi.org/10.1016/j.jtbi.2013.11.012
[7]  Zhang, Z., Tan, M., Xie, Z., Dai, L., Chen, Y. and Zhao, Y. (2011) Identification of Lysine Succinylation as a New Post-Translational Modification. Nature Chemical Biology, 7, 58-63.
https://doi.org/10.1038/nchembio.495
[8]  Jia, J., Liu, Z., Xiao, X., Liu, B. and Chou, K.C. (2016) iSuc-PseOpt: Identifying Lysine Succinylation Sites in Proteins by Incorporating Sequence-Coupling Effects into Pseudo Components and Optimizing Imbalanced Training Dataset. Analytical Biochemistry, 497, 48-56.
https://doi.org/10.1016/j.ab.2015.12.009
[9]  Qiu, W.R., Sun, B.Q., Xiao, X., Xu, Z.C. and Chou, K.C. (2016) iPTM-mLys: Identifying Multiple Lysine PTM Sites and Their Different Types. Bioinformatics, 32, 3116-3123.
https://doi.org/10.1093/bioinformatics/btw380
[10]  Deng, W., Wang, Y., Ma, L., Zhang, Y., Ullah, S. and Xue, Y. (2016) Computational Prediction of Methylation Types of Covalently Modified Lysine and Arginine Residues in Proteins. Briefings in Bioinformatics, 18, 647-658.
https://doi.org/10.1093/bib/bbw041
[11]  Hasan, M.M., Yang, S., Zhou, Y. and Mollah, M.N.H. (2016) SuccinSite: A Computational Tool for the Prediction of Protein Succinylation Sites by Exploiting the Amino Acid Patterns and Properties. Molecular BioSystems, 12, 786-795.
https://doi.org/10.1039/C5MB00853K
[12]  Xu, Y., Ding, Y.X., Ding, J., Wu, L.Y. and Xue, Y. (2016) Mal-Lys: Prediction of Lysine Malonylation Sites in Proteins Integrated Sequence-Based Features with mRMR Feature Selection. Scientific Reports, 6, 38318.
https://doi.org/10.1038/srep38318
[13]  Jiang, M. and Cao, J.Z. (2016) Positive-Unlabeled Learning for Pupylation Sites Prediction. BioMed Research International, 2016, Article ID 4525786.
https://doi.org/10.1155/2016/4525786
[14]  Wuyun, Q., Zheng, W., Zhang, Y.P., Ruan, J.S. and Hu, G. (2016) Improved Species-Specific Lysine Acetylation Site Prediction Based on a Large Variety of Features Set. PLoS ONE, 11, e0155370.
https://doi.org/10.1371/journal.pone.0155370
[15]  Jia, J., Liu, Z., Xiao, X., Liu, B. and Chou, K.C. (2016) iCar-PseCp: Identify Carbonylation Sites in Proteins by Monto Carlo Sampling and Incorporating Sequence Coupled Effects into General PseAAC. Oncotarget, 7, 34558-34570.
https://doi.org/10.18632/oncotarget.9148
[16]  Qiu, W.R., Xiao, X., Lin, W.Z. and Chou, K.C. (2015) iUbiq-Lys: Prediction of Lysine Ubiquitination Sites in Proteins by Extracting Sequence Evolution Information via a Gray System Model. Journal of Biomolecular Structure and Dynamics, 33, 731-1742.
https://doi.org/10.1080/07391102.2014.968875
[17]  Xiao, X., Cheng, X., Su, S., Mao, Q. and Chou, K.C. (2017) pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins. Natural Science, 9, 330.
https://doi.org/10.4236/ns.2017.99032
[18]  Cheng, X., Zhao, S.G., Xiao, X. and Chou, K.C. (2017) iATC-mHyb: A Hybrid Multi-Label Classifier for Predicting the Classification of Anatomical Therapeutic Chemicals. Oncotarget, 8, 58494-58503.
https://doi.org/10.18632/oncotarget.17028
[19]  Jia, J., Liu, Z., Xiao, X., Liu, B. and Chou, K.C. (2016) iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets. Molecules, 21, 95.
https://doi.org/10.3390/molecules21010095
[20]  Xiao, X., Min, J.L., Lin, W.Z., Liu, Z., Cheng, X. and Chou, K.C. (2015) iDrug-Target: Predicting the Interactions between Drug Compounds and Target Proteins in Cellular Networking via Benchmark Dataset Optimization Approach. Journal of Biomolecular Structure and Dynamics, 33, 2221-2233.
https://doi.org/10.1080/07391102.2014.998710
[21]  Liu, Z., Xiao, X., Qiu, W.R. and Chou, K.C. (2015) iDNA-Methyl: Identifying DNA Methylation Sites via Pseudo Trinucleotide Composition. Analytical Biochemistry, 474, 69-77.
https://doi.org/10.1016/j.ab.2014.12.009
[22]  Sun, Y., Wong, A.K. and Kamel, M.S. (2009) Classification of Imbalanced Data: A Review. International Journal of Pattern Recognition and Artificial Intelligence, 23, 687-719.
https://doi.org/10.1142/S0218001409007326
[23]  Nath, A. and Karthikeyan, S. (2016) Enhanced Prediction and Characterization of CDK Inhibitors Using Optimal Class Distribution. Interdisciplinary Sciences: Computational Life Sciences, 9, 292-303.
[24]  Veropoulos, K., Campbell, C. and Cristianini, N. (1999) Controlling the Sensitivity of Support Vector Machines. Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, 31 July-6 August 1999, 55-60.
[25]  Hasan, M.A.M., Li, J., Ahmad, S. and Molla, M.K.I. (2017) predCar-Site: Carbonylation Sites Prediction in Proteins Using Support Vector Machine with Resolving Data Imbalanced Issue. Analytical Biochemistry, 525, 107-113.
https://doi.org/10.1016/j.ab.2017.03.008
[26]  Batuwita, R. and Palade, V. (2010) Efficient Resampling Methods for Training Support Vector Machines with Imbalanced Datasets. The 2010 International Joint Conference on Neural Networks, Barcelona, 1-8.
[27]  Chou, K.C. (1993) A Vectorized Sequence-Coupling Model for Predicting HIV Protease Cleavage Sites in Proteins. Journal of Biological Chemistry, 268, 16938-16948.
[28]  Hasan, M.A.M., Ahmad, S. and Molla, M.K.I. (2017) iMulti-HumPhos: A Multi-Label Classifier for Identifying Human Phosphorylated Proteins Using Multiple Kernel Learning Based Support Vector Machine. Molecular BioSystems, 13, 1608-1618.
https://doi.org/10.1039/C7MB00180K
[29]  Ju, Z. and He, J.J. (2017) Prediction of Lysine Propionylation Sites Using Biased SVM and Incorporating Four Different Sequence Features into Chou’s PseAAC. Journal of Molecular Graphics and Modelling, 76, 356-363.
https://doi.org/10.1016/j.jmgm.2017.07.022
[30]  Chen, P., Hu, S., Zhang, J., Gao, X., Li, J., Xia, J. and Wang, B. (2016) A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13, 901-912.
https://doi.org/10.1109/TCBB.2015.2505286
[31]  Qiu, W.R., Zheng, Q.S., Sun, B.Q. and Xiao, X. (2016) Multi-iPPseEvo: A Multi-Label Classifier for Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into Chou’s General PseAAC via Grey System Theory. Molecular Informatics, 36, Article ID: 1600085.
[32]  Wang, X., Yan, R., Li, J. and Song, J. (2016) SOHPRED: A New Bioinformatics Tool for the Characterization and Prediction of Human S-Sulfenylation Sites. Molecular BioSystems, 12, 2849-2858.
https://doi.org/10.1039/C6MB00314A
[33]  Hu, J., Li, Y., Yang, J.Y., Shen, H.B. and Yu, D.J. (2016) GPCR-Drug Interactions Prediction Using Random Forest with Drug-Association-Matrix-Based Post-Processing Procedure. Computational Biology and Chemistry, 60, 59-71.
https://doi.org/10.1016/j.compbiolchem.2015.11.007
[34]  Hu, J., Han, K., Li, Y., Yang, J.Y., Shen, H.B. and Yu, D.J. (2016) TargetCrys: Protein Crystallization Prediction by Fusing Multi-View Features with Two-Layered SVM. Amino Acids, 48, 2533-2547.
https://doi.org/10.1007/s00726-016-2274-4
[35]  Jia, J., Zhang, L., Liu, Z., Xiao, X. and Chou, K.C. (2016) pSumo-CD: Predicting Sumoylation Sites in Proteins with Covariance Discriminant Algorithm by Incorporating Sequence-Coupled Effects into General PseAAC. Bioinformatics, 32, 3133-3141.
https://doi.org/10.1093/bioinformatics/btw387
[36]  Chou, K.C. (2011) Some Remarks on Protein Attribute Prediction and Pseudo Amino Acid Composition. Journal of Theoretical Biology, 273, 236-247.
https://doi.org/10.1016/j.jtbi.2010.12.024
[37]  Chou, K.C. (2004) Using Amphiphilic Pseudo Amino Acid Composition to Predict Enzyme Subfamily Classes. Bioinformatics, 21, 10-19.
https://doi.org/10.1093/bioinformatics/bth466
[38]  Behbahani, M., Mohabatkar, H. and Nosrati, M. (2016) Analysis and Comparison of Lignin Peroxidases between Fungi and Bacteria Using Three Different Modes of Chou’s General Pseudo Amino Acid Composition. Journal of Theoretical Biology, 411, 1-5.
https://doi.org/10.1016/j.jtbi.2016.09.001
[39]  Meher, P.K., Sahu, T.K., Saini, V. and Rao, A.R. (2017) Predicting Antimicrobial Peptides with Improved Accuracy by Incorporating the Compositional, Physico-Chemical and Structural Features into Chou’s General PseAAC. Scientific Reports, 7, Article No. 42362.
https://doi.org/10.1038/srep42362
[40]  Liu, B., Liu, F., Wang, X., Chen, J., Fang, L. and Chou, K.C. (2015) Pse-in-One: A Web Server for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. Nucleic Acids Research, 43, W65-W71.
https://doi.org/10.1093/nar/gkv458
[41]  Liu, B. and Wu, H. (2017) Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various m Odes of Pseudo Components of DNA, RNA, and Protein Sequences. Natural Science, 9, 67-91.
https://doi.org/10.4236/ns.2017.94007
[42]  Chou, K.C. (1996) Prediction of Human Immunodeficiency Virus Protease Cleavage Sites in Proteins. Analytical Biochemistry, 233, 1-14.
https://doi.org/10.1006/abio.1996.0001
[43]  Vapnik, V.N. (1999) The Nature of Statistical Learning Theory. Second Edition, Springer, New York.
[44]  Scholkopf, B. and Smola, A.J. (2001) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge.
[45]  Hasan, M.A.M., Ahmad, S. and Molla, M.K.I. (2017) Protein Subcellular Localization Prediction Using Multiple Kernel Learning Based Support Vector Machine. Molecular BioSystems, 13, 785-795.
https://doi.org/10.1039/C6MB00860G
[46]  Ju, Z., Cao, J.Z. and Gu, H. (2016) Predicting Lysine Phosphoglycerylation with Fuzzy SVM by Incorporating k-Spaced Amino Acid Pairs into Chou’s General PseAAC. Journal of Theoretical Biology, 397, 145-150.
https://doi.org/10.1016/j.jtbi.2016.02.020
[47]  Chou, K.C. (2013) Some Remarks on Predicting Multi-Label Attributes in Molecular Biosystems. Molecular Biosystems, 9, 1092-1100.
https://doi.org/10.1039/c3mb25555g
[48]  Xu, Y., Ding, Y.X., Deng, N.Y. and Liu, L.M. (2016) Prediction of Sumoylation Sites in Proteins Using Linear Discriminant Analysis. Gene, 576, 99-104.
https://doi.org/10.1016/j.gene.2015.09.072
[49]  Liu, B., Liu, Y., Jin, X., Wang, X. and Liu, B. (2016) iRSpot-DACC: A Computational Predictor for Recombination Hot/Cold Spots Identification Based on Dinucleotide-Based Auto-Cross Covariance. Scientific Reports, 6, Article No. 33483.
https://doi.org/10.1038/srep33483
[50]  Liao, Z., Ju, Y. and Zou, Q. (2016) Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest. Scientifica, 2016, Article ID: 8309253.
https://doi.org/10.1155/2016/8309253
[51]  Lin, W.Z., Fang, J.A., Xiao, X. and Chou, K.C. (2013) iLoc-Animal: A Multi-Label Learning Classifier for Predicting Subcellular Localization of Animal Proteins. Molecular BioSystems, 9, 634-644.
https://doi.org/10.1039/c3mb25466f
[52]  Huang, C. and Yuan, J.Q. (2013) A Multilabel Model Based on Chou’s Pseudo-Amino Acid Composition for Identifying Membrane Proteins with both Single and Multiple Functional Types. The Journal of Membrane Biology, 246, 327-334.
https://doi.org/10.1007/s00232-013-9536-9
[53]  Xiao, X., Wang, P., Lin, W.Z., Jia, J.H. and Chou, K.C. (2013) iAMP-2L: A Two-Level Multi-Label Classifier for Identifying Antimicrobial Peptides and Their Functional Types. Analytical Biochemistry, 436, 168-177.
https://doi.org/10.1016/j.ab.2013.01.019
[54]  Hasan, M.A.M., Ahmad, S. and Molla, M.K.I. (2017) Protein Subcellular Localization Prediction Using Support Vector Machine with the Choice of Proper Kernel. BioTechnologia, 98, 85-96.
https://doi.org/10.5114/bta.2017.68307
[55]  Xu, H., Zhou, J., Lin, S., Deng, W., Ying, Z. and Yu, X. (2017) PLMD: An Updated Data Resource of Protein Lysine Modifications. Journal of Genetics & Genomics, 44, 243-250.
https://doi.org/10.1016/j.jgg.2017.03.007

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133