|
抗乳腺癌候选药物的生物活性预测模型的构建
|
Abstract:
乳腺癌是目前发病率较高的疾病之一,选择能够拮抗ERα活性的化合物对治疗乳腺癌具有重要的意义。本文综合考虑ERα拮抗剂的生物活性即pIC50值(与生物活性具有正相关性),筛选治疗乳腺癌的候选药物。为了构建能够使化合物对抑制ERα具有更好的生物活性的预测模型,本文通过随机森林算法和距离相关系数算法,寻求主要变量进行降维处理;通过建立基于BP神经网络的pIC50预测模型并进行训练与验证,为寻找处理后变量的全局最优解,采用粒子群优化算法,以pIC50的最大值作为目标函数,设定参数运行求得优化结果。研究结果表明,pIC50的最大值与其对应的分子描述符都在合理的区间范围内,说明此次建立的模型具有一定的稳定性与合理性。
Breast cancer is one of the diseases with high incidence at present. It is important to select compounds that can antagonize ERα activity in the treatment of breast cancer. In this study, the bioactivity of ERα antagonists (pIC50 value), which was positively correlated with biological activity, was considered to screen candidate drugs for breast cancer. In order to construct a compound that can inhibit ERα with better biological activity, this paper seeks the main variables for dimensionality reduction through random forest algorithm and distance correlation coefficient algorithm; Through the establishment, training and verification of the pIC50 prediction model based on BP neural network, in order to find the global optimal solution of the processed variables, the particle swarm optimization algorithm is adopted, the maximum value of pIC50 is taken as the objective function, and the parameters are set to run to obtain the optimization results. The results show that the maximum value of pIC50 and its corresponding molecular descriptor are within a reasonable range, indicating that the model established this time has certain stability and rationality.
[1] | 后梦玥, 黄照权. 雌激素受体-α36与乳腺癌的关系研究进展[J]. 中国医药导报, 2021, 18(14): 36-38+50. |
[2] | 王勇, 王晓东, 陈文捷, 于兆进, 吴慧哲, 赵琳, 魏敏杰. 散发性乳腺癌中DNMT3a、DNMT3b表达与ERα基因启动子甲基化状态及蛋白表达的关系[J]. 天津医药, 2015, 43(5): 500-504. |
[3] | Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324 |
[4] | Breiman, L. (1996) Bagging Predictors. Machine Learning, 24, 123-140. https://doi.org/10.1007/BF00058655 |
[5] | 王娟, 华东, 罗建平. Python编程基础与数据分析[M]. 南京: 南京大学出版社, 2019. |
[6] | Li, J., Wang, B. and Li, H. (2021) Research on Computer Forecast Model Using BP Neural Network and Pearson Correlation Coefficient. .Journal of Physics: Conference Series, 2033, Article ID: 012091.
https://doi.org/10.1088/1742-6596/2033/1/012091 |
[7] | Li, W., Meng, X. and Huang, Y. (2021) Fitness Distance Correlation and Mixed Search Strategy for Differential Evolution. Neurocomputing, 458, https://doi.org/10.1016/j.neucom.2019.12.141 |
[8] | Székely, G.J., Rizzo, M.L. and Bakirov, N.K. (2007) Measuring and Testing Dependence by Correlation of Distances. Annals of Statistics, 35, 2769-2794. https://doi.org/10.1214/009053607000000505 |
[9] | Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) Learning Internal Representation by Back-Propagation Errors. Nature, 323, 533-536. https://doi.org/10.1038/323533a0 |
[10] | 张刘, 叶楠, 马灵玲, 汪琪, 吕雪莹, 章家保. 改进粒子群优化算法的高光谱波段选择[J]. 光谱学与光谱分析, 2021, 41(10): 3194-3199. |