Heart disease continues to be a major global cause of death, making the development of reliable prediction models necessary to enable early detection and treatment. Using machine learning to improve prediction accuracy, this study investigates the use of the Extra Tree (Extremely Randomized Trees) algorithm for heart disease prediction. The research includes data preparation, model training, and performance evaluation using measures like accuracy, precision, recall, and F1-score. It makes use of a dataset that includes a variety of medical and demographic variables. The Extra Tree model outperforms a number of baseline models in terms of accuracy and predictive power. The dataset was obtained from the University of California, Irvine (UCI) Machine Learning Repository, which contains about 319,796 instances and 18 attributes related to heart disease. The attributes serve as the features. This study reduced the number of features from 18 to 7, by using recursive feature elimination method, which uses Random Forest as an estimator. The Extra Tree model demonstrates great performance, showing high accuracy, precision, recall, and f1 scores of 93.1%, 94.8%, 100% and 93.1% respectively on a dataset split ratio of 80% to 20% train set and test set respectively. The study concluded that the model may be implemented into a clinical decision support system to help healthcare providers diagnose cardiac disease. Furthermore, the feature importance analysis can help direct future research into finding the most significant risk factors for cardiovascular disease.
References
[1]
Mukasheva, G., Abenova, M., Shaltynov, A., Tsigengage, O., Mussabekova, Z., Bulegenov, T., et al. (2022) Incidence and Mortality of Cardiovascular Disease in the Republic of Kazakhstan: 2004-2017. Iranian Journal of Public Health, 51, 821-830. https://doi.org/10.18502/ijph.v51i4.9243
[2]
Dai, H., Bragazzi, N.L., Younis, A., Zhong, W., Liu, X., Wu, J., et al. (2021) Worldwide Trends in Prevalence, Mortality, and Disability-Adjusted Life Years for Hypertensive Heart Disease from 1990 to 2017. Hypertension, 77, 1223-1233. https://doi.org/10.1161/hypertensionaha.120.16483
[3]
Upadhyay, R.K. (2022) Chronic Non-Communicable Diseases: Risk Factors, Disease Burden, Mortalities and Control. Acta Scientific Medical Sciences, 6, 153-170. https://doi.org/10.31080/asms.2022.06.1227
[4]
Ahsan, M.M. and Siddique, Z. (2022) Machine Learning-Based Heart Disease Diagnosis: A Systematic Literature Review. Artificial Intelligence in Medicine, 128, Article ID: 102289. https://doi.org/10.1016/j.artmed.2022.102289
[5]
Ahmad, G.N., Shafiullah, Fatima, H., Abbas, M., Rahman, O., Imdadullah, et al. (2022) Mixed Machine Learning Approach for Efficient Prediction of Human Heart Disease by Identifying the Numerical and Categorical Features. Applied Sciences, 12, Article 7449. https://doi.org/10.3390/app12157449
[6]
Agrawal, Y., Kumar, M., Ananthakrishnan, S. and Kumarapuram, G. (2022) Evapotranspiration Modeling Using Different Tree Based Ensembled Machine Learning Algorithm. Water Resources Management, 36, 1025-1042. https://doi.org/10.1007/s11269-022-03067-7
[7]
Loukika, K.N., Keesara, V.R. and Sridhar, V. (2021) Analysis of Land Use and Land Cover Using Machine Learning Algorithms on Google Earth Engine for Munneru River Basin, India. Sustainability, 13, Article 13758. https://doi.org/10.3390/su132413758
[8]
Yang, R. and Yu, Y. (2021) Artificial Convolutional Neural Network in Object Detection and Semantic Segmentation for Medical Imaging Analysis. Frontiers in Oncology, 11, Article 638182. https://doi.org/10.3389/fonc.2021.638182
[9]
Dabija, A., Kluczek, M., Zagajewski, B., Raczko, E., Kycko, M., Al-Sulttani, A.H., et al. (2021) Comparison of Support Vector Machines and Random Forests for Corine Land Cover Mapping. Remote Sensing, 13, Article 777. https://doi.org/10.3390/rs13040777
[10]
Avci, C., Budak, M., Yağmur, N. and Balçik, F. (2023) Comparison between Random Forest and Support Vector Machine Algorithms for LULC Classification. International Journal of Engineering and Geosciences, 8, 1-10. https://doi.org/10.26833/ijeg.987605
[11]
Subbiah, S.S. and Chinnappan, J. (2021) Opportunities and Challenges of Feature Selection Methods for High Dimensional Data: A Review. Ingénierie des systèmes d information, 26, 67-77. https://doi.org/10.18280/isi.260107
[12]
Bharadiya, J.P. (2023) A Tutorial on Principal Component Analysis for Dimensionality Reduction in Machine Learning. International Journal of Innovative Science and Research Technology, 8, 2028-2032.
[13]
Ayon, S.I., Islam, M.M. and Hossain, M.R. (2020) Coronary Artery Heart Disease Prediction: A Comparative Study of Computational Intelligence Techniques. IETE Journal of Research, 68, 2488-2507. https://doi.org/10.1080/03772063.2020.1713916
[14]
Khan, A., Qureshi, M., Daniyal, M. and Tawiah, K. (2023) A Novel Study on Machine Learning Algorithm-Based Cardiovascular Disease Prediction. Health & Social Care in the Community, 2023, Article ID: 1406060. https://doi.org/10.1155/2023/1406060
[15]
Bajaj, S. and Behera, L. (2023) Predictive Modeling of Cardiovascular Disease Using Machine Learning Techniques. 2023 Third International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, 26-28 May 2023, 518-523. https://doi.org/10.1109/icsccc58608.2023.10176425
[16]
Asif, D., Bibi, M., Arif, M.S. and Mukheimer, A. (2023) Enhancing Heart Disease Prediction through Ensemble Learning Techniques with Hyperparameter Optimization. Algorithms, 16, Article 308. https://doi.org/10.3390/a16060308
[17]
Singh, P., Pal, G.K. and Gangwar, S. (2022) Prediction of Cardiovascular Disease Using Feature Selection Techniques. International Journal of Computer Theory and Engineering, 14, 97-103. https://doi.org/10.7763/ijcte.2022.v14.1316
[18]
Krittanawong, C., Virk, H.U.H., Bangalore, S., Wang, Z., Johnson, K.W., Pinotti, R., et al. (2020) Machine Learning Prediction in Cardiovascular Diseases: A Meta-Analysis. Scientific Reports, 10, Article No. 16057. https://doi.org/10.1038/s41598-020-72685-1
[19]
Tsarapatsani, K., Sakellarios, A.I., Pezoulas, V.C., Tsakanikas, V.D., Kleber, M.E., Marz, W., et al. (2022) Machine Learning Models for Cardiovascular Disease Events Prediction. 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, 11-15 July 2022, 1066-1069. https://doi.org/10.1109/embc48229.2022.9871121
[20]
Dalal, S., Goel, P., Onyema, E.M., Alharbi, A., Mahmoud, A., Algarni, M.A., et al. (2023) Application of Machine Learning for Cardiovascular Disease Risk Prediction. Computational Intelligence and Neuroscience, 2023, Article ID: 9418666. https://doi.org/10.1155/2023/9418666
[21]
Yadav, A.L., Soni, K. and Khare, S. (2023) Heart Diseases Prediction Using Machine Learning. 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, 6-8 July 2023, 1-7. https://doi.org/10.1109/icccnt56998.2023.10306469
[22]
Bhatt, C.M., Patel, P., Ghetia, T. and Mazzeo, P.L. (2023) Effective Heart Disease Prediction Using Machine Learning Techniques. Algorithms, 16, Article 88. https://doi.org/10.3390/a16020088
[23]
Riyaz, L., Butt, M.A., Zaman, M. and Ayob, O. (2021) Heart Disease Prediction Using Machine Learning Techniques: A Quantitative Review. In: Khanna, A., Gupta, D., Bhattacharyya, S., Hassanien, A.E., Anand, S. and Jaiswal, A., Eds., International Conference on Innovative Computing and Communications, Springer, 81-94. https://doi.org/10.1007/978-981-16-3071-2_8
[24]
Javaid, M., Haleem, A., Singh, R.P. and Suman, R. (2021) Artificial Intelligence Applications for Industry 4.0: A Literature-Based Study. Journal of Industrial Integration and Management, 7, 83-111. https://doi.org/10.1142/s2424862221300040
[25]
Touretzky, D., Gardner-McCune, C. and Seehorn, D. (2022) Machine Learning and the Five Big Ideas in Ai. International Journal of Artificial Intelligence in Education, 33, 233-266. https://doi.org/10.1007/s40593-022-00314-1
[26]
Sjödin, D., Parida, V., Palmié, M. and Wincent, J. (2021) How AI Capabilities Enable Business Model Innovation: Scaling AI through Co-Evolutionary Processes and Feedback Loops. Journal of Business Research, 134, 574-587. https://doi.org/10.1016/j.jbusres.2021.05.009
[27]
Das, A. (2023) Logistic Regression. In: Maggino, F., Ed., Encyclopedia of Quality of Life and Well-Being Research, Springer, 3985-3986. https://doi.org/10.1007/978-3-031-17299-1_1689
[28]
Cunningham, P. and Delany, S.J. (2021) K-Nearest Neighbour Classifiers—A Tutorial. ACM Computing Surveys, 54, 1-25. https://doi.org/10.1145/3459665