|
Biophysics 2024
基于集成模型的蛋白变构位点预测方法
|
Abstract:
变构是调节蛋白质功能的重要机制,对许多生物过程至关重要。变构调节剂比正构剂具有更高的特异性和更低的毒副作用,这使得变构药物设计比正构药物设计有更多的优势。变构位点的发现是变构药物设计的前提,目前实验上获得的变构位点多是偶然所得,因此亟待发展有效的理论方法来预测蛋白质变构位点。本工作提出了一种集成的机器学习方法AllosEC用于预测蛋白质变构口袋,该方法除了考虑口袋的理化性质外,还加入了口袋的二级结构信息、深度指数(DPX)和突出指数(CX)特征。另外,为了克服正负样本极度不平衡的问题,本工作使用欠采样方法来平衡训练数据集。在独立测试集上,AllosEC在多个评价指标上优于现有的其他方法,SEN、SPE、PRE和MCC分别为0.708、0.915、0.405和0.486。这样,本工作提供了性能良好的蛋白质变构位点预测方法AllosEC。
Allostery is an important mechanism for regulating protein functions, which is essential for many biological processes. Compared with orthosteric regulators, allosteric regulators have higher specificity and lower toxicities, which makes allosteric drug design have more advantages than orthosteric drug design. The discovery of allosteric sites is a prerequisite for allosteric drug design. Currently, experimentally obtained allosteric sites are mostly obtained by chance, and therefore there is an urgent need to develop effective theoretical methods to predict protein allosteric sites. Here, we present an ensemble machine learning method AllosEC for protein allosteric pocket prediction, where besides the pockets’ physicochemical properties, their secondary structure information, depth indexes (DPXes) and protrusion indexes (CXes) are considered. In order to overcome the problem of extreme imbalance between positive and negative samples, this work uses an under sampling method to balance the training dataset. AllosEC outperforms other existing methods in multiple evaluation metrics on the independent test set, with SEN, SPE, PRE and MCC of 0.708, 0.915, 0.405 and 0.486, respectively. Thus, this work provides a good method AllosEC for protein allosteric site prediction.
[1] | Greener, J.G. and Sternberg, M.J. (2018) Structure-Based Prediction of Protein Allostery. Current Opinion in Structural Biology, 50, 1-8. https://doi.org/10.1016/j.sbi.2017.10.002 |
[2] | Liu, J. and Nussinov, R. (2016) Allostery: An Overview of Its History, Concepts, Methods, and Applications. PLOS Computational Biology, 12, e1004966. https://doi.org/10.1371/journal.pcbi.1004966 |
[3] | Zha, J., Li, M., Kong, R., et al. (2022) Explaining and Predicting Allostery with Allosteric Database and Modern Analytical Techniques. Journal of Molecular Biology, 434, Article ID: 167481. https://doi.org/10.1016/j.jmb.2022.167481 |
[4] | Lu, S., He, X., Ni, D., et al. (2019) Allosteric Modulator Discovery: From Serendipity to Structure-Based Design. Journal of Medicinal Chemistry, 62, 6405-6421. https://doi.org/10.1021/acs.jmedchem.8b01749 |
[5] | Guarnera, E. and Berezovsky, I.N. (2016) Allosteric Sites: Remote Control in Regulation of Protein Activity. Current Opinion in Structural Biology, 37, 1-8. https://doi.org/10.1016/j.sbi.2015.10.004 |
[6] | Cheng, X. and Jiang, H. (2019) Allostery in Drug Development. In: Zhang, J. and Nussinov, R., Eds., Protein Allostery in Drug Discovery, Advances in Experimental Medicine and Biology, Vol. 1163, Springer, Berlin, 1-23. https://doi.org/10.1007/978-981-13-8719-7_1 |
[7] | Jiang, Y. and Kalodimos, C.G. (2017) NMR Studies of Large Proteins. Journal of Molecular Biology, 429, 2667-2676. https://doi.org/10.1016/j.jmb.2017.07.007 |
[8] | Xiao, S., Verkhivker, G.M. and Tao, P. (2022) Machine Learning and Protein Allostery. Trends in Biochemical Sciences, 48, 375-390. https://doi.org/10.1016/j.tibs.2022.12.001 |
[9] | Gulati, S., Palczewski, K., Engel, A., et al. (2019) Cryo-EM Structure of Phosphodiesterase 6 Reveals Insights into the Allosteric Regulation of Type I Phosphodiesterases. Science Advances, 5, v4322. https://doi.org/10.1126/sciadv.aav4322 |
[10] | Qi, Y., Wang, Q., Tang, B., et al. (2012) Identifying Allosteric Binding Sites in Proteins with a Two-State Go Model for Novel Allosteric Effector Discovery. Journal of Chemical Theory and Computation, 8, 2962-2971. https://doi.org/10.1021/ct300395h |
[11] | Weinkam, P., Pons, J. and Sali, A. (2012) Structure-Based Model of Allostery Predicts Coupling between Distant Sites. Proceedings of the National Academy of Sciences of the United States of America, 109, 4875-4880. https://doi.org/10.1073/pnas.1116274109 |
[12] | Goncearenco, A., Mitternacht, S., Yong, T., et al. (2013) Spacer: Server for Predicting Allosteric Communication and Effects of Regulation. Nucleic Acids Research, 41, W266-W272. https://doi.org/10.1093/nar/gkt460 |
[13] | Ma, X., Meng, H. and Lai, L. (2016) Motions of Allosteric and Orthosteric Ligand-Binding Sites in Proteins Are Highly Correlated. Journal of Chemical Information and Modeling, 56, 1725-1733. https://doi.org/10.1021/acs.jcim.6b00039 |
[14] | Suel, G.M., Lockless, S.W., Wall, M.A., et al. (2003) Evolutionarily Conserved Networks of Residues Mediate Allosteric Communication in Proteins. Nature Structural Biology, 10, 59-69. https://doi.org/10.1038/nsb881 |
[15] | Wang, J., Jain, A., Mcdonald, L.R., et al. (2020) Mapping Allosteric Communications within Individual Proteins. Nature Communications, 11, Article No. 3862. https://doi.org/10.1038/s41467-020-17618-2 |
[16] | Huang, W., Lu, S., Huang, Z., et al. (2013) Allosite: A Method for Predicting Allosteric Sites. Bioinformatics, 29, 2357-2359. https://doi.org/10.1093/bioinformatics/btt399 |
[17] | Le Guilloux, V., Schmidtke, P. and Tuffery, P. (2009) Fpocket: An Open Source Platform for Ligand Pocket Detection. BMC Bioinformatics, 10, Article No. 168. https://doi.org/10.1186/1471-2105-10-168 |
[18] | Panjkovich, A. and Daura, X. (2014) Pars: A Web Server for the Prediction of Protein Allosteric and Regulatory Sites. Bioinformatics, 30, 1314-1315. https://doi.org/10.1093/bioinformatics/btu002 |
[19] | Song, K., Liu, X., Huang, W., et al. (2017) Improved Method for the Identification and Validation of Allosteric Sites. Journal of Chemical Information and Modeling, 57, 2358-2363. https://doi.org/10.1021/acs.jcim.7b00014 |
[20] | Huang, W., Wang, G., Shen, Q., et al. (2015) ASBench: Benchmarking Sets for Allosteric Discovery. Bioinformatics, 31, 2598-2600. https://doi.org/10.1093/bioinformatics/btv169 |
[21] | Shen, Q., Wang, G., Li, S., et al. (2016) Asd v3.0: Unraveling Allosteric Regulation with Structural Mechanisms and Biological Networks. Nucleic Acids Research, 44, D527-D535. https://doi.org/10.1093/nar/gkv902 |
[22] | Kabsch, W. and Sander, C. (1983) Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers, 22, 2577-2637. https://doi.org/10.1002/bip.360221211 |
[23] | Mihel, J., Sikic, M., Tomic, S., et al. (2008) Psaia-Protein Structure and Interaction Analyzer. BMC Structural Biology, 8, Article No. 21. https://doi.org/10.1186/1472-6807-8-21 |
[24] | Wolpert, D.H. (1992) Stacked Generalization. Neural Networks, 5, 241-259. https://doi.org/10.1016/S0893-6080(05)80023-1 |
[25] | Cherkassky, V. (1997) The Nature of Statistical Learning Theory. IEEE Transactions on Neural Networks, 8, 1564. https://doi.org/10.1109/TNN.1997.641482 |
[26] | Zhang, H. (2004) The Optimality of Naive Bayes. Proceedings FLAIRS, 2, 562-567. |
[27] | Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324 |
[28] | Kleinbaum, D.G. and Klein, M. (2010) Logistic Regression. Springer, New York. https://doi.org/10.1007/978-1-4419-1742-3 |
[29] | Cover, T.M.T. (1968) Rates of Convergence for Nearest Neighbor Procedures. Proceedings of the Hawaii International Conference on System Sciences, Honolulu, 29-30 January 1968, 413-415. |
[30] | Freund, Y. and Schapire, R.E. (1997) A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55, 119-139. https://doi.org/10.1006/jcss.1997.1504 |
[31] | Chen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794. https://doi.org/10.1145/2939672.2939785 |
[32] | Friedman, J.H. (2002) Stochastic Gradient Boosting. Computational Statistics & Data Analysis, 38, 367-378. https://doi.org/10.1016/S0167-9473(01)00065-2 |
[33] | Scheepstra, M., Leysen, S., van Almen, G.C., et al. (2015) Identification of an Allosteric Binding Site for Rorgammat Inhibition. Nature Communications, 6, Article No. 8833. https://doi.org/10.1038/ncomms9833 |
[34] | Bagautdinov, B., Kuroishi, C., Sugahara, M., et al. (2005) Crystal Structures of Biotin Protein Ligase from Pyrococcus horikoshii ot3 and Its Complexes: Structural Basis of Biotin Activation. Journal of Molecular Biology, 353, 322-333. https://doi.org/10.1016/j.jmb.2005.08.032 |