%0 Journal Article
%T 抗癌药物活性的预测和优化分析——以乳腺癌为例
Prediction and Optimization of Anticancer Drug Activity—A Case Study of Breast Cancer
%A 候爽
%J Statistics and Applications
%P 1338-1347
%@ 2325-226X
%D 2022
%I Hans Publishing
%R 10.12677/SA.2022.116139
%X 本文利用机器学习方法,探究了统计模型在制药领域的应用价值。以治疗乳腺癌的候选药物为例,在研发过程中,主要考虑如下两种指标:1) 找出能够抑制ERα活性的化合物,尽量降低ERα活性;2) 药物应当具备ADMET性质。本文采用数据挖掘技术和基本统计方法,首先基于随机森林算法从504个变量中筛选出20个主要变量,利用XGBoost算法建立化合物结构(即分子描述符)与ERα生物活性之间的非线性关系,预测ERα生物活性,结果显示用该方法预测的准确率为92.74%。再用粒子群算法建立优化模型,在保证化合物具有较好的生物活性和ADMET性质的前提下,计算出分子描述符的具体数值,该研究结果为药物设计提供了一定的参考依据。
This paper uses machine learning methods to explore the application value of statistical models in the pharmaceutical field. Taking drug candidates for the treatment of breast cancer as an example, in the research and development process, the following two indicators are mainly considered: 1) find out the compounds that can inhibit ERα activity and reduce ERα activity as much as possible; 2) The drug should have ADMET properties. In this paper, data mining technology and basic statistical methods are used, firstly, 20 main variables are selected from 504 variables based on random forest algorithm, and the nonlinear relationship between compound structure (i.e. molecular descriptor) and ERα biological activity is established by XGBoost algorithm to predict ERα biological activity, and the results show that the accuracy of prediction by this method is 92.74%. Then, the particle swarm algorithm is used to establish an optimization model, and the specific value of the molecular descriptor is calculated under the premise of ensuring that the compound has good biological activity and ADMET properties, and the results of this study provide a certain reference for drug design.
%K 随机森林,XGBoost,粒子群算法,优化问题
Random Forest
%K XGBoost
%K Particle Swarm Optimization Algorithm
%K Optimization Problem
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=58894