The early detection of type 2 diabetes is a major challenge for healthcare professionals, as a late diagnosis can lead to severe and difficult-to-manage complications. In this context, this paper proposes an innovative hybrid approach based on an ensemble method using Voting, designed to improve the accuracy of diabetes prediction. Our methodology is based on three main steps. First, we balanced the dataset classes using the SMOTEENN method to correct imbalances and ensure a fair representation of positive and negative classes. Next, we combined three complementary algorithms—Extra Trees Classifier (ETC), XGBoost (XGB), and K-Nearest Neighbors (KNN)—using the Voting strategy. This combination allows us to leverage the specific strengths of each model while reducing their individual limitations. Finally, we applied GridSearch to optimize hyperparameters, ensuring maximum model performance. The results obtained from experiments conducted on the Pima Indians Diabetes Dataset are remarkable. Our hybrid model achieves an overall accuracy of 95.50%, a precision of 93.22%, a recall of 98.21%, an F1-Score of 95.65%, and an AUC-ROC of 98.83%. These performances surpass those of individual models, demonstrating the potential of this approach for developing reliable and effective tools dedicated to the early diagnosis of type 2 diabetes.
References
[1]
Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., et al. (2019) Global and Regional Diabetes Prevalence Estimates for 2019 and Projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th Edition. Diabetes Research and Clinical Practice, 157, Article 107843. https://doi.org/10.1016/j.diabres.2019.107843
[2]
World Health Organizations (2016) Global Report on diabètes. World Health Organization. https://apps.who.int/iris/handle/10665/204871
[3]
Thammano, A. and Meengen, A. (2005) A New Evolutionary Neural Network Classifier. Advances in Knowledge Discovery and Data Mining, Hanoi, 18-20 May 2005, 249-255. https://doi.org/10.1007/11430919_31
[4]
Cohen, S., Dagan, N., Cohen-Inger, N., Ofer, D. and Rokach, L. (2021) ICU Survival Prediction Incorporating Test-Time Augmentation to Improve the Accuracy of Ensemble-Based Models. IEEE Access, 9, 91584-91592. https://doi.org/10.1109/access.2021.3091622
[5]
Mushtaq, Z., Ramzan, M.F., Ali, S., Baseer, S., Samad, A. and Husnain, M. (2022) Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques. Mobile Information Systems, 2022, Article ID: 6521532. https://doi.org/10.1155/2022/6521532
[6]
Kibria, H.B., Nahiduzzaman, M., Goni, M.O.F., Ahsan, M. and Haider, J. (2022) An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI. Sensors, 22, Article 7268. https://doi.org/10.3390/s22197268
[7]
Patil, R.N., Rawandale, S., Rawandale, N., Rawandale, U. and Patil, S. (2023) An Efficient Stacking Based NSGA-II Approach for Predicting Type 2 Diabetes. International Journal of Electrical and Computer Engineering, 13, 1015-1023. https://doi.org/10.11591/ijece.v13i1.pp1015-1023
[8]
Bhopte, M. and Rai, M. (2022) Hybrid Deep Learning CNN-LSTM Model for Diabetes Prediction. International Journal of Scientific Research, 8, 444-447.
[9]
Qin, L. (2022) A Prediction Model of Diabetes Based on Ensemble Learning. Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition, Xiamen, 23-25 September 2022, 45-51. https://doi.org/10.1145/3573942.3573949
[10]
Kumari, S., Kumar, D. and Mittal, M. (2021) An Ensemble Approach for Classification and Prediction of Diabetes Mellitus Using Soft Voting Classifier. International Journal of Cognitive Computing in Engineering, 2, 40-46. https://doi.org/10.1016/j.ijcce.2021.01.001
[11]
Alzubaidi, A.A., Halawani, S.M. and Jarrah, M. (2023) Towards a Stacking Ensemble Model for Predicting Diabetes Mellitus Using Combination of Machine Learning Techniques. International Journal of Advanced Computer Science and Applications, 14, 348-358. https://doi.org/10.14569/ijacsa.2023.0141236
[12]
Rashid, M.M., Yaseen, O.M., Saeed, R.R. and Alasaady, M.T. (2024) An Improved Ensemble Machine Learning Approach for Diabetes Diagnosis. Pertanika Journal of Science & Technology, 33, 1335-1350.
[13]
Amma N.G., B. (2024) En-RfRsK: An Ensemble Machine Learning Technique for Prognostication of Diabetes Mellitus. Egyptian Informatics Journal, 25, Article 100441. https://doi.org/10.1016/j.eij.2024.100441
[14]
Talari, P., N, B., Kaur, G., Alshahrani, H., Al Reshan, M.S., Sulaiman, A., et al. (2024) Hybrid Feature Selection and Classification Technique for Early Prediction and Severity of Diabetes Type 2. PLOS ONE, 19, e0292100. https://doi.org/10.1371/journal.pone.0292100
[15]
Nagassou, M., Mwangi, R.W. and Nyarige, E. (2023) A Hybrid Ensemble Learning Approach Utilizing Light Gradient Boosting Machine and Category Boosting Model for Lifestyle-Based Prediction of Type-II Diabetes Mellitus. Journal of Data Analysis and Information Processing, 11, 480-511. https://doi.org/10.4236/jdaip.2023.114025
[16]
Altaher Taha, A. and Jameel Malebary, S. (2022) A Hybrid Meta-Classifier of Fuzzy Clustering and Logistic Regression for Diabetes Prediction. Computers, Materials & Continua, 71, 6089-6105. https://doi.org/10.32604/cmc.2022.023848
[17]
Kibria, H.B., Nahiduzzaman, M., Goni, M.O.F., Ahsan, M. and Haider, J. (2022) An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI. Sensors, 22, Article 7268. https://doi.org/10.3390/s22197268
[18]
Dutta, A., Hasan, M.K., Ahmad, M., Awal, M.A., Islam, M.A., Masud, M., et al. (2022) Early Prediction of Diabetes Using an Ensemble of Machine Learning Models. International Journal of Environmental Research and Public Health, 19, Article 12378. https://doi.org/10.3390/ijerph191912378
[19]
Sarker, I.H., Faruque, F., Alqahtani, H. and Kalim, A. (2018) K-Nearest Neighbor Learning Based Diabetes Mellitus Prediction and Analysis for eHealth Services. EAI Endorsed Transactions on Scalable Information Systems, 7, e4. https://doi.org/10.4108/eai.13-7-2018.162737
[20]
Sakagraha Kuspinta, N., Widodo, A.W. and Furqon, M.T. (2018) Penentuan Menu Makanan Untuk Penderita Diabetes Menggunakan Metode Iterative Dichotomizer Tree (ID3). https://j-ptiik.ub.ac.id/index.php/j-ptiik
[21]
Tombokan, M., et al. (2017) Hubungan dukungan keluarga dengan motivasi dalam men-gontrol kadar gula darah pada penderita diabetes melitus di wilayah kerja puskesmas pampang kecamatan panakkukang kota makassar. Journal Media Keperawatan: Politeknik Kesehatan Makassar, 8, 39-45.
[22]
Rustam, F., Ashraf, I., Mehmood, A., Ullah, S. and Choi, G. (2019) Tweets Classification on the Base of Sentiments for US Airline Companies. Entropy, 21, Article 1078. https://doi.org/10.3390/e21111078
[23]
Safavian, S.R. and Landgrebe, D. (1991) A Survey of Decision Tree Classifier Methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21, 660-674. https://doi.org/10.1109/21.97458