全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于XGBoost等分类算法对葡萄酒数据集的R语言实践
R Practice on Wine Datasets Based on Classification Algorithms Such as XGBoost

DOI: 10.12677/HJDM.2022.122019, PP. 182-191

Keywords: 逻辑回归,随机森林,决策树,支持向量机,极端梯度提升树
Logistic Regression
, Random Forests, Decision Trees, Support Vector Machines, Extreme Gradient Boosting Trees

Full-Text   Cite this paper   Add to My Lib

Abstract:

本文利用R语言研究了决策树、随机森林、支持向量机几种机器学习分类算法在葡萄酒数据集上的表现,分别得到的准确率为67.24%、68.15%、66.25%,表现较好。其中随机森林算法在这三种分类算法中表现最为良好,支持向量机表现最差,但三种算法的效果相差不大。由于三种算法在准确率上仍有较高的提升空间,因此引入极端梯度提升树(XGBoost)进行分类,该算法在随机森林与决策树的基础上进行改进,所得效果最好,为73.59%。然而直接基于R语言中四种机器学习算法对葡萄酒数据集分类所得效果较一般,仍需要在这个基础上予以改进。
In this paper, we studied the performance of several machine learning classification algorithms of decision tree, random forest, and support vector machine on wine dataset using R language, and the accuracy obtained was 67.24%, 68.15%, and 66.25%, respectively, which performed well. The random forest algorithm performed the best among these three classification algorithms, and the sup-port vector machine performed the worst, but the results of the three algorithms were not very different. Since there is still room for higher improvement in the accuracy of the three algorithms, the extreme gradient boosting tree (XGBoost) is introduced for classification, which improves on the random forest and decision tree and obtains the best result of 73.59%. However, the classification of the wine dataset directly based on the four machine learning algorithms in R is not very good and still needs to be improved on this basis.

References

[1]  曲晨, 覃玉, 毛涛, 等. 决策树模型与logistic回归在中学生尝试吸烟影响属性中的应用[J]. 中国慢性病预防与控制, 2020, 28(4): 264-269.
[2]  Liaw, A. and Wiener, M. (2002) Classification and Regression by Random Forest. R News, 23.
[3]  Cauwenberghs, G. and Poggio, T. (2001) Incremental and Decremental Support Vector Machine Learn-ing. Advances in Neural Information Processing Systems, 13, 409-412.
[4]  杨剑锋, 乔佩蕊, 李永梅, 等. 机器学习分类问题及算法研究综述[J]. 统计与决策, 2019(6): 36-40.
[5]  Friedman, J.H. (2001) Greedy Function Approxi-mation: A Gradient Boosting Machine. Annals of Statistics, 29, 1189-1232. https://doi.org/10.1214/aos/1013203451
[6]  谭中明, 谢坤, 彭耀鹏. 基于梯度提升决策树模型的P2P网贷借款人信用风险评测研究[J]. 软科学, 2018, 32(12): 5.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133