Optimization of Malaria Diagnosis by Machine Learning According to the CRISP-DM Model Applied to the University Teaching Hospital Clinics of Lubumbashi (DRC)
Malaria remains a major public health challenge in the Democratic Republic of Congo (DRC), particularly in Lubumbashi, where traditional diagnostic methods are struggling to meet growing demand. The study was conducted at the University Clinics of Lubumbashi (UCL), the teaching hospital affiliated with the University of Lubumbashi. This work proposes an expert system based on artificial intelligence (AI) and the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology to optimize malaria diagnosis in this setting. By leveraging a decision tree classifier trained on local clinical data, the system achieved an accuracy of 90.4%, a recall of 88%, and a specificity of 92%. The results demonstrate a substantial improvement in the speed and reliability of diagnosis, providing a transparent and interpretable decision-support tool suitable for resource-limited healthcare environments.
Cite this paper
Mazunze, B. , Vicky, L. M. , Franck, K. N. , Pierre-Stéphane, M. M. , Patrice, K. M. E. , Desiré, K. D. and Eddy, M. S. (2025). Optimization of Malaria Diagnosis by Machine Learning According to the CRISP-DM Model Applied to the University Teaching Hospital Clinics of Lubumbashi (DRC). Open Access Library Journal, 12, e14143. doi: http://dx.doi.org/10.4236/oalib.1114143.
Daneluzzo, L., Daneluzzo, M., Thellier, F., et al. (2025) “Severe Imported Malaria in Children in Metropolitan France, 2011-2023.” Medecine et Maladies Infectieuses, 4, S24.
Mbanefo, A. and Kumar, N. (2020) Evaluation of Malaria Diagnostic Methods as a Key for Successful Control and Elimination Programs. Tropical Medicine and Infectious Disease, 5, Article 102. https://doi.org/10.3390/tropicalmed5020102
Liang, H.Y., Tsui, B.Y., Ni, H., Valentim, S., et al. (2019) Evaluation and Accurate Diagnoses of Pediatric Diseases Using Artificial Intelligence. Nature Medicine, 25, 433-438.
Kermany, D.S., Goldbaum, M., Cai, W.J., et al. (2018) Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell, 172, 1122-1131.
Nsoesie, E.O., Buckeridge, D.L. and Brownstein, J.S. (2014) Guess Who’s Not Coming to Dinner? Evaluating Online Restaurant Reservations for Disease Surveil-lance. Journal of Medical Internet Research, 16, e22. https://doi.org/10.2196/jmir.2998
Marbn, S., Mariscal, G. and Segovi, J. (2009) A Data Mining & Knowledge Discovery Process Model. In: Data Mining and Knowledge Discovery in Real Life Applications, I-Tech Education and Publishing. https://doi.org/10.5772/6438
Garcia-Rios, V., Marres-Salhuana, M., Sierra-Liñan, F. and Cabanillas-Carbonell, M. (2023) Predictive Machine Learning Applying Cross Industry Standard Process for Data Mining for the Diagnosis of Diabe-tes Mellitus Type 2. IAES International Journal of Artificial Intelligence, 12, Article 1713. https://doi.org/10.11591/ijai.v12.i4.pp1713-1726
Amato, F., López, A., Peña-Méndez, E.M., Vaňhara, P., Hampl, A. and Havel, J. (2013) Artificial Neural Networks in Medical Diagnosis. Journal of Applied Biomedicine, 11, 47-58. https://doi.org/10.2478/v10136-012-0031-x
Jakobsen, J.C., Gluud, C., Wetterslev, J. and Winkel, P. (2017) When and How Should Multiple Imputation Be Used for Handling Missing Data in Randomized Clinical Trials—A Practical Guide with Flowcharts. BMC Medical Research Methodology, 17, Article No. 162.
Kursa, M.B. and Rudnicki, W.R. (2010) Feature Selection with the Boruta Package. Journal of Statistical Software, 36, 1-13. https://doi.org/10.18637/jss.v036.i11
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F. and Pedreschi, D. (2019) A Survey of Methods for Explaining Black Box Models. ACM Computing Surveys, 51, 1-42. https://doi.org/10.1145/3236009
Shortliffe, E.H. and Sepúlveda, M.J. (2018) Clinical Decision Support in the Era of Artificial Intelligence. Journal of the American Medical Association, 320, 2199-2200. https://doi.org/10.1001/jama.2018.17163
Gambetti, A., Han, Q., Shen, H. and Soares, C. (2025) A Survey on Hu-man-Centered Evaluation of Explainable AI Methods in Clinical Decision Support Systems.
Nahm, F.S. (2022) Receiv-er Operating Characteristic Curve: Overview and Practical Use for Clinicians. Korean Journal of Anesthesiology, 75, 25-36. https://doi.org/10.4097/kja.21209
Lasko, T.A., Bhagwat, J.G., Zou, K.H. and Ohno-Machado, L. (2005) The Use of Receiver Operating Characteristic Curves in Biomedical Informatics. Journal of Biomedical Informatics, 38, 404-415. https://doi.org/10.1016/j.jbi.2005.02.008
Corbacioglu, S.K. and Aksel, G. (2023) Receiver Operating Characteristic Curve Analysis in Diagnostic Accuracy Studies: A Guide to Interpreting the Area under the Curve Value. Turkish Journal of Emergency Medicine, 23, 195-198. https://doi.org/10.4103/tjem.tjem_182_23
Lasko, T.A., Bhagwat, J.G., Zou, K.H. and Ohno-Machado, L. (2005) The Use of Receiver Operating Characteristic Curves in Biomedical Informatics. Journal of Biomedical Informatics, 38, 404-415. https://doi.org/10.1016/j.jbi.2005.02.008
Flach, P. (2012) Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge University Press.
https://books.google.com/books?hl=fr&lr=&id=Ofp4h_oXsZ4C&
oi=fnd&pg=PR15&dq=P. Flach, *Machine Learning: The Art and Sc
ience of Algorithms that Make Sense of Data*, Cambridge University Press,
2012.&ots=XMtWjqdpSM&sig=ofDk26bPb6y_EDEqgA1Zit2euNo
Davis, J. and Goadrich, M. (2006) The Relationship between Preci-sion-Recall and ROC Curves. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, 25-29 June 2006, 233-240. https://doi.org/10.1145/1143844.1143874
Sokolova, M. and Lapalme, G. (2009) A System-atic Analysis of Performance Measures for Classification Tasks. Information Processing & Management, 45, 427-437. https://doi.org/10.1016/j.ipm.2009.03.002