全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于Smote-XGBoost算法的心脏病预测模型研究
A Study of Heart Disease Prediction Model Based on Smote-XGBoost Algorithm

DOI: 10.12677/HJDM.2022.123003, PP. 220-234

Keywords: 心脏病预测,Smote-Enn算法,XGBoost算法,混淆矩阵,Heart Disease Prediction, Smote-Enn Algorithm, XGBoost Algorithm, Confusion Matrix

Full-Text   Cite this paper   Add to My Lib

Abstract:

该模型首先采用合成少数类过采样技术编辑的最近邻来平衡训练数据分布,然后通过集成学习算法XGBoost预测心脏病。为了验证模型效果,本文采用心脏病患者真实医疗数据,利用专家咨询法提取特征,并通过混淆矩阵进行模型评估。与4类基线算法相比,所提模型在AUC、Accuracy、Recall和F-Score指标的评测下均表现良好。实验结果显示,所提模型能够为心脏病预测提供更精准、更智能的辅助参考,同时可以在一定程度上提高诊断的效率和心脏病预测的准确率。
The proposed model uses nearest neighbors edited by synthetic minority class oversampling techniques to balance the training data distribution, and then predicts heart disease by ensemble learning algorithm XGBoost. To detect the prediction reliability, a real medical dataset of heart dis-ease patients are used, features are extracted using expert consultation method, and the model is evaluated by confusion matrix. Compared with the four types of baseline algorithms, the proposed model performs well in terms of AUC, Accuracy, Recall and F-Score metrics. The experimental results show that the proposed model can provide a more accurate and intelligent auxiliary reference for heart disease prediction, and it can also improve the efficiency of diagnosis and the accuracy of heart disease prediction to some extent.

References

[1]  徐继伟, 杨云. 集成学习方法: 研究综述[J]. 云南大学学报(自然科学版), 2018, 40(6): 1082-1092.
[2]  Nahato, K.B., Harichandran, K.N. and Arputharaj, K. (2015) Knowledge Mining from Clinical Datasets Using Rough Sets and Backpropagation Neural Network. Computational and Mathematical Methods in Medicine, 2015, Article ID: 460189.
https://doi.org/10.1155/2015/460189
[3]  Dwivedi, K. (2018) Performance Evaluation of Different Machine Learning Techniques for Prediction of Heart Disease. Neural Computing and Applications, 29, 685-693.
https://doi.org/10.1007/s00521-016-2604-1
[4]  Wiharto, W., Kusnanto, H. and Herianto, H. (2016) Intelligence System for Diagnosis Level of Coronary Heart Disease with K-Star Algorithm. Healthcare Informatics Research, 22, 30-38.
https://doi.org/10.4258/hir.2016.22.1.30
[5]  Surenthiran, K., Pritheega, M. and Roslina, I. (2021) Hybrid Deep Learning Model Using Recurrent Neural Network and Gated Recurrent Unit for Heart Disease Prediction. Interna-tional Journal of Electrical & Computer Engineering, 11, 5467-5476.
https://doi.org/10.11591/ijece.v11i6.pp5467-5476
[6]  Sellami, A. and Hwang, H. (2019) A Robust Deep Convo-lutional Neural Network with Batch-Weighted Loss for Heartbeat Classification. Expert Systems with Applications, 122, 75-84.
https://doi.org/10.1016/j.eswa.2018.12.037
[7]  Wang, Y., Sun, L. and Subramani, S. (2021) Cab: Classi-fying Arrhythmias Based on Imbalanced Sensor Data. KSII Transactions on Internet and Information Systems, 15, 2304-2320.
https://doi.org/10.3837/tiis.2021.07.001
[8]  Purushottam, Saxena, K. and Sharma, R. (2016) Efficient Heart Disease Prediction System. Procedia Computer Science, 85, 962-969.
https://doi.org/10.1016/j.procs.2016.05.288
[9]  Beyene, C. and Kamat, P. (2018) Survey on Prediction and Anal-ysis the Occurrence of Heart Disease Using Data Mining Techniques. International Journal of Pure and Applied Math-ematics, 118, 165-174.
https://doi.org/10.5120/2237-2860
[10]  Soni, J., Ansari, U., Sharma, D. and Soni, S. (2011) Predictive Data Min-ing for Medical Diagnosis: An Overview of Heart Disease Prediction. International Journal of Computer Applications, 17, 43-48.
[11]  王凤利. 基于BP神经网络和DS证据理论的疾病预测模型研究[D]: [硕士学位论文]. 太原: 太原理工大学, 2016.
[12]  蔡勋玮. SVM结合DS证据理论的心血管病预测方法研究[D]: [硕士学位论文]. 西安: 西安电子科技大学, 2018.
[13]  李孝虔. 基于卷积神经网络的心脏病预测方法研究[D]: [硕士学位论文]. 哈尔滨: 东北林业大学, 2019.
[14]  Jain, Y.K. and Bhandare, S.K. (2011) Min Max Normalization Based Data Perturbation Meth-od for Privacy Protection. International Journal of Computer & Communication Technology, 2, 45-50.
[15]  尚旭. 不平衡数据集的混合采样方法[J]. 数字技术与应用, 2016(12): 68-71.
[16]  Chen, T. and Guestrin, C. (2016) Xgboost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining, San Francisco, 13-17 August 2016, 785-794.
https://doi.org/10.1145/2939672.2939785
[17]  Speiser, J.L., Miller, M.E., Tooze, J. and Ip, E. (2019) A Compari-son of Random Forest Variable Selection Methods for Classification Prediction Modeling. Expert Systems with Applica-tions, 134, 93-101.
https://doi.org/10.1016/j.eswa.2019.05.028
[18]  Tharwat, A. (2019) Parameter Investigation of Support Vector Machine Classifier with Kernel Functions. Knowledge and Information Systems, 61, 1269-1302.
https://doi.org/10.1007/s10115-019-01335-4
[19]  Chen, S., Webb, G.I., Liu, L. and Ma, X. (2020) A Novel Selec-tive Na?ve Bayes Algorithm. Knowledge-Based Systems, 192, Article ID: 105361.
https://doi.org/10.1016/j.knosys.2019.105361
[20]  Wang, H.Y., Zhu, R. and Ma, P. (2018) Optimal Subsampling for Large Sample Logistic Regression. Journal of the American Statistical Association, 113, 829-844.
https://doi.org/10.1080/01621459.2017.1292914

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133