全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

A Machine Learning Classification Model for Detecting Prediabetes

DOI: 10.4236/jdaip.2024.123024, PP. 462-478

Keywords: Prediabetes, Machine Learning, SVM, Forest, Cumulative Lift

Full-Text   Cite this paper   Add to My Lib

Abstract:

The incidence of prediabetes is in a dangerous condition in the USA. The likelihood of increasing chronic and complex health issues is very high if this stage of prediabetes is ignored. So, early detection of prediabetes conditions is critical to decrease or avoid type 2 diabetes and other health issues that come as a result of untreated and undiagnosed prediabetes condition. This study is done in order to detect the prediabetes condition with an artificial intelligence method. Data used for this study is collected from the Centers for Disease Control and Prevention’s (CDC) survey conducted by the Division of Health and Nutrition Examination Surveys (DHANES). In this study, several machine learning algorithms are exploited and compared to determine the best algorithm based on Average Squared Error (ASE), Kolmogorov-Smirnov (Youden) scores, areas under the ROC and some other measures of the machine learning algorithm. Based on these scores, the champion model is selected, and Random Forest is the champion model with approximately 89% accuracy.

References

[1]  Buysschaert, M. and Bergman, M. (2011) Definition of Prediabetes. Medical Clinics of North America, 95, 289-297.
https://doi.org/10.1016/j.mcna.2010.11.002
[2]  CDC: Centers for Disease Control and Prevention (2019) What Causes Prediabetes?
https://www.cdc.gov/diabetes/prevention-type-2/prediabetes-prevent-type-2.html?CDC_AAref_Val=
https://www.cdc.gov/diabetes/basics/prediabetes.html
[3]  Tabák, A.G., Herder, C., Rathmann, W., Brunner, E.J. and Kivimäki, M. (2012) Prediabetes: A High-Risk State for Diabetes Development. The Lancet, 379, 2279-2290.
https://doi.org/10.1016/s0140-6736(12)60283-9
[4]  Coutinho, M., Gerstein, H.C., Wang, Y. and Yusuf, S. (1999) The Relationship between Glucose and Incident Cardiovascular Events. A Metaregression Analysis of Published Data from 20 Studies of 95,783 Individuals Followed for 12.4 Years. Diabetes Care, 22, 233-240.
https://doi.org/10.2337/diacare.22.2.233
[5]  Port, S.C., Goodarzi, M.O., Boyle, N.G. and Jennrich, R.I. (2005) Blood Glucose: A Strong Risk Factor for Mortality in Nondiabetic Patients with Cardiovascular Disease. American Heart Journal, 150, 209-214.
https://doi.org/10.1016/j.ahj.2004.09.031
[6]  Tian, L., Zhu, J., Liu, L., Liang, Y., Li, J. and Yang, Y. (2013) Prediabetes and Short-Term Outcomes in Nondiabetic Patients after Acute St-Elevation Myocardial Infarction. Cardiology, 127, 55-61.
https://doi.org/10.1159/000354998
[7]  Herman, W.H., Hoerger, T.J., Brandle, M., Hicks, K., Sorensen, S., Zhang, P., et al. (2005) The Cost-Effectiveness of Lifestyle Modification or Metformin in Preventing Type 2 Diabetes in Adults with Impaired Glucose Tolerance. Annals of Internal Medicine, 142, 323-332.
https://doi.org/10.7326/0003-4819-142-5-200503010-00007
[8]  Yoo, T.K., Kim, S.K., Kim, D.W., Choi, J.Y., Lee, W.H., Oh, E., et al. (2013) Osteoporosis Risk Prediction for Bone Mineral Density Assessment of Postmenopausal Women Using Machine Learning. Yonsei Medical Journal, 54, 1321-1330.
https://doi.org/10.3349/ymj.2013.54.6.1321
[9]  National Health and Nutrition Examination Survey: NHANES 2015-2016 Questionnaire Data.
https://wwwn.cdc.gov/nchs/nhanes/ContinuousNhanes/Default.aspx?BeginYear=2015
[10]  Mitchell, T.M. (1997) Learning, Machine. McGraw-Hill.
[11]  Schölkopf, B. and Smola, A.J. (2001). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and beyond. The MIT Press.
https://doi.org/10.7551/mitpress/4175.001.0001
[12]  Friedman, J.H. (2001) Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29, 1189-1232.
https://doi.org/10.1214/aos/1013203451
[13]  Segal, M.R. (2004) Machine Learning Benchmarks and Random Forest Regression. Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco.
[14]  Hansen, L.K. and Salamon, P. (1990) Neural Network Ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 993-1001.
https://doi.org/10.1109/34.58871
[15]  (2019) SAS Viya Data Mining and Machine Learning: Procedures Guide. The TREESPLIT Procedure—Variable Importance.
https://documentation.sas.com/?docsetId=casmlocsetTar-get=viyaml_treesplit_details20.htmdocsetVersion=3.0locale=en
[16]  Fernandez, G. (2010) Statistical Data Mining Using SAS Applications. CRC Press.
[17]  Rubin, D.B. (2004) Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.
[18]  Guyon, I. and Elisseeff, A. (2003) An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3, 1157-1182.
[19]  Wallach, D. and Goffinet, B. (1989) Mean Squared Error of Prediction as a Criterion for Evaluating and Comparing System Models. Ecological Modelling, 44, 299-306.
https://doi.org/10.1016/0304-3800(89)90035-5
[20]  Hanley, J.A. and McNeil, B.J. (1982) The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve. Radiology, 143, 29-36.
https://doi.org/10.1148/radiology.143.1.7063747
[21]  Fluss, R., Faraggi, D. and Reiser, B. (2005) Estimation of the Youden Index and Its Associated Cutoff Point. Biometrical Journal, 47, 458-472.
https://doi.org/10.1002/bimj.200410135
[22]  Hastie, T., Tibshirani, R., Friedman, J.H. and Friedman, J.H. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
[23]  Gareth, J., Daniela, W., Trevor, H. and Robert, T. (2013) An Introduction to Statistical Learning: With Applications in R. Spinger.
[24]  Raquibul Bashar, A.K.M. (2019) Probabilistic Modeling of Democracy, Corruption, Hemophilia A and Prediabetes Data. Ph.D. Thesis, University of South Florida.
[25]  Bashar, A.K.M.R. and Tsokos, C.P. (2019) Statistical Parametric Analysis on Democracy Data. Open Access Library Journal, 06, 1-18.
https://doi.org/10.4236/oalib.1105828
[26]  Raquibul Bashar, A.KM. and Tsokos, C.P. (2017) Parametric Analysis of Factor 8 (F8) Hemophilia A. International Journal of Mathematical Sciences in Medicine (IJMSM), 1, 1-10.
[27]  Raquibul Bashar, A.K.M. and Tsokos, C.P. (2019) Statistical Classification of Democracy Index Scores of Countries of the World. Scholars Journal of Arts, Humanities and Social Sciences, 7, 773-784.
[28]  Festus Ayetiran, E. and Barnabas Adeyemo, A. (2012) A Data Mining-Based Response Model for Target Selection in Direct Marketing. International Journal of Information Technology and Computer Science, 4, 9-18.
https://doi.org/10.5815/ijitcs.2012.01.02
[29]  Choi, S.B., Kim, W.J., Yoo, T.K., Park, J.S., Chung, J.W., Lee, Y.H., Kang, E.S. and Kim, D.W. (2014) Screening for Prediabetes Using Machine Learning Models. Computational and Mathematical Methods in Medicine, 2014, Article ID: 618976.
[30]  Meng, X., Huang, Y., Rao, D., Zhang, Q. and Liu, Q. (2012) Comparison of Three Data Mining Models for Predicting Diabetes or Prediabetes by Risk Factors. The Kaohsiung Journal of Medical Sciences, 29, 93-99.
https://doi.org/10.1016/j.kjms.2012.08.016
[31]  Lee, Y., Bang, H., Kim, H.C., Kim, H.M., Park, S.W. and Kim, D.J. (2012) A Simple Screening Score for Diabetes for the Korean Population: Development, Validation, and Comparison with Other Scores. Diabetes Care, 35, 1723-1730.
https://doi.org/10.2337/dc11-2347

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133