%0 Journal Article %T 基于数据挖掘技术的乳腺癌亚型识别方法<br>An Identification Method for Breast Cancer Subtypes Based on Data Mining Technology %A 杨绍华 %A 陈冬东 %A 张旭 %A 何林< %A br> %A YANG Shao-hua %A CHEN Dong-dong %A ZHANG Xu %A HE lin %J 西南大学学报(自然科学版) %D 2018 %R 10.13718/j.cnki.xdzk.2018.05.018 %X 随机森林算法可对特征进行重要性排序,并能提高运行效率和分类的准确率.采用方差分析、随机森林算法对乳腺癌基因进行筛选,使得用随机森林算法、支持向量机算法和<i>k</i>近邻算法测试集的准确率分别达到95.6%,92.9%和92.7%,并发现了区分乳腺癌不同亚型的两种最重要的基因GATA3和ESR1.<br>The random forest algorithm can rank features in accordance with their importance and improve the efficiency of operation and the accuracy of classification. In a study reported herein, variance analysis and the random forest algorithm were used to select the characteristics of breast cancer, and the accuracy rate of the random forest algorithm, the CVM (support vector machine) algorithm and the KNN (<i>k</i>-nearest neighbor) algorithm were 95.6%, 92.9% and 92.7%, respectively. Two most important genes, GATA3 and ESR1, were discovered, which can distinguish different subtypes of breast cancer %K 数据挖掘 %K 微阵列 %K 乳腺癌 %K 分类< %K br> %K data mining %K microarray %K breast cancer %K classification %U http://xbgjxt.swu.edu.cn/jsuns/html/jsuns/2018/5/201805018.htm