全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于机器学习识别中国社区心血管疾病人群
Identification of Cardiovascular Disease Populations in Chinese Communities Based on Machine Learning

DOI: 10.12677/acm.2025.151273, PP. 2059-2069

Keywords: 心血管疾病,机器学习,预测模型
Cardiovascular Disease
, Machine Learning, Predictive Model

Full-Text   Cite this paper   Add to My Lib

Abstract:

目的:在我国城乡居民疾病死亡构成比里,心血管疾病位居首位。患者通常在出现症状时才前往就医,而且诊断心血管疾病的传统手段既复杂又昂贵。鉴于此,本研究旨在借助一般人口特征、合并症以及常规体检血检指标来识别心血管疾病患者。方法:样本选取自CHARLS数据库13,420的参与者。删除缺失值后,运用逻辑回归、决策树、K-最邻近算法、随机森林、神经网络构建模型,通过比较接收者操作特征曲线下面积(ROC_AUC)值选择最优模型进一步构建各心血管疾病亚组模型,并采用SHAP算法对模型予以解释。结果:通过逻辑回归构建的模型效能最佳,其ROC_AUC值为0.7644 (95% CI: 0.7397~0.7890),其中对心脏病的识别效能较好,ROC_AUC值为0.7747。SHAP算法对模型的解释显示,年龄、体重指数、糖尿病以及吸烟史在识别心血管病方面有着重要贡献。结论:基于机器学习方法能够识别心血管病患者,可利用简易检查结果在早期对高风险人群进行识别并实施干预。
Objective: Cardiovascular diseases account for the highest proportion of deaths among both urban and rural residents in our country. Patients typically seek medical attention only after the onset of symptoms, and traditional diagnostic methods for cardiovascular diseases are often complex and costly. Therefore, this study aimed to identify patients with cardiovascular diseases based on general population characteristics, comorbidities, and routine physical blood test indicators. Methods: Samples were drawn from 13,420 participants in the CHARLS database. After removing missing values, models were constructed using logistic regression, decision trees, the K-nearest neighbor algorithm, random forests, and neural networks. The optimal model was selected by comparing the area under the receiver operating characteristic curve (ROC_AUC) which facilitated the construction of subgroup models for each type of cardiovascular disease. The SHAP algorithm was employed to interpret the models. Results: The logistic regression model exhibited the best performance, achieving an ROC_AUC value of 0.7644 (95% CI: 0.7397~0.7890), with a particularly strong recognition of heart disease, which had an ROC_AUC value of 0.7747. The interpretation provided by the SHAP algorithm indicated that age, body mass index, diabetes, and smoking history significantly contributed to the identification of cardiovascular diseases. Conclusion: Utilizing machine learning methods, it is possible to identify patients with cardiovascular diseases, allowing for the early identification and intervention of high-risk groups based on the results of physical examinations.

References

[1]  国家心血管病中心, 中国心血管健康与疾病报告编写组, 胡盛寿. 中国心血管健康与疾病报告2023概要[J]. 中国循环杂志, 2024, 39(7): 625-660.
[2]  杨继, 张垚, 马腾, 等. 1990-2019年中国心血管疾病流行现状、疾病负担及发病预测分析[J]. 中国全科医学, 2024, 27(2): 233-244, 252.
[3]  Shehab, M., Abualigah, L., Shambour, Q., Abu-Hashem, M.A., Shambour, M.K.Y., Alsalibi, A.I., et al. (2022) Machine Learning in Medical Applications: A Review of State-Of-The-Art Methods. Computers in Biology and Medicine, 145, Article ID: 105458.
https://doi.org/10.1016/j.compbiomed.2022.105458
[4]  Pinto-Coelho, L. (2023) How Artificial Intelligence Is Shaping Medical Imaging Technology: A Survey of Innovations and Applications. Bioengineering, 10, Article 1435.
https://doi.org/10.3390/bioengineering10121435
[5]  França, R.P., Bonacin, R. and Monteiro, A.C.B. (2024) The Growing Application Potential of Machine Learning in Healthcare Systems of Modernity. In: Leal Filho, W. and Kuzmanović, V., Eds., Sustainable Development Seen through the Lenses of Ethnoeconomics and the Circular Economy, Springer, 1-17.
https://doi.org/10.1007/978-3-031-72676-7_1
[6]  Zhao, Y., Hu, Y., Smith, J.P., Strauss, J. and Yang, G. (2012) Cohort Profile: The China Health and Retirement Longitudinal Study (CHARLS). International Journal of Epidemiology, 43, 61-68.
https://doi.org/10.1093/ije/dys203
[7]  Bagley, S.C., White, H. and Golomb, B.A. (2001) Logistic Regression in the Medical Literature: Standards for Use and Reporting, with Particular Attention to One Medical Domain. Journal of Clinical Epidemiology, 54, 979-985.
https://doi.org/10.1016/s0895-4356(01)00372-9
[8]  Quinlan, J.R. (1986) Induction of Decision Trees. Machine Learning, 1, 81-106.
https://doi.org/10.1007/bf00116251
[9]  Taunk, K., De, S., Verma, S. and Swetapadma, A. (2019) A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, 15-17 May 2019, 1255-1260.
https://doi.org/10.1109/iccs45141.2019.9065747
[10]  Liaw, A. and Wiener, M. (2002) Classification and Regression by Random Forest. R News, 2, 18-22.
[11]  Du, K., Leung, C., Mow, W.H. and Swamy, M.N.S. (2022) Perceptron: Learning, Generalization, Model Selection, Fault Tolerance, and Role in the Deep Learning Era. Mathematics, 10, Article 4730.
https://doi.org/10.3390/math10244730
[12]  Lundberg, S. and Lee, S.I. (2017) A Unified Approach to Interpreting Model Predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 4768-4777.
[13]  Ambale-Venkatesh, B., Yang, X., Wu, C.O., Liu, K., Hundley, W.G., McClelland, R., et al. (2017) Cardiovascular Event Prediction by Machine Learning: The Multi-Ethnic Study of Atherosclerosis. Circulation Research, 121, 1092-1101.
https://doi.org/10.1161/circresaha.117.311312
[14]  Niccoli, T. and Partridge, L. (2012) Ageing as a Risk Factor for Disease. Current Biology, 22, R741-R752.
https://doi.org/10.1016/j.cub.2012.07.024
[15]  North, B.J. and Sinclair, D.A. (2012) The Intersection between Aging and Cardiovascular Disease. Circulation Research, 110, 1097-1108.
https://doi.org/10.1161/circresaha.111.246876
[16]  Chen, Y., Yu, W., Lv, J., Sun, D., Pei, P., Du, H., et al. (2024) Early Adulthood BMI and Cardiovascular Disease: A Prospective Cohort Study from the China Kadoorie Biobank. The Lancet Public Health, 9, e1005-e1013.
https://doi.org/10.1016/s2468-2667(24)00043-4
[17]  Wang, L., Ding, H., Deng, Y., Huang, J., Lao, X. and Wong, M.C.S. (2024) Associations of Obesity Indices Change with Cardiovascular Outcomes: A Dose-Response Meta-Analysis. International Journal of Obesity, 48, 635-645.
https://doi.org/10.1038/s41366-024-01485-8
[18]  Sharma, A., Mittal, S., Aggarwal, R. and Chauhan, M.K. (2020) Diabetes and Cardiovascular Disease: Inter-Relation of Risk Factors and Treatment. Future Journal of Pharmaceutical Sciences, 6, Article No. 130.
https://doi.org/10.1186/s43094-020-00151-w
[19]  Adams, B., Jacocks, L. and Guo, H. (2020) Higher BMI Is Linked to an Increased Risk of Heart Attacks in European Adults: A Mendelian Randomisation Study. BMC Cardiovascular Disorders, 20, Article No. 258.
https://doi.org/10.1186/s12872-020-01542-w
[20]  Khan Minhas, A.M., Sedhom, R., Jean, E.D., Shapiro, M.D., Panza, J.A., Alam, M., et al. (2024) Global Burden of Cardiovascular Disease Attributable to Smoking, 1990-2019: An Analysis of the 2019 Global Burden of Disease Study. European Journal of Preventive Cardiology, 31, 1123-1131.
https://doi.org/10.1093/eurjpc/zwae040
[21]  Mambo, A., Yang, Y., Mahulu, E. and Zihua, Z. (2024) Investigating the Interplay of Smoking, Cardiovascular Risk Factors, and Overall Cardiovascular Disease Risk: NHANES Analysis 2011-2018. BMC Cardiovascular Disorders, 24, Article No. 193.
https://doi.org/10.1186/s12872-024-03838-7
[22]  王权, 刘德平. 高尿酸血症与高血压[J]. 中华老年医学杂志, 2019, 38(7): 820-824.
[23]  Lanaspa, M.A., Andres-Hernando, A. and Kuwabara, M. (2020) Uric Acid and Hypertension. Hypertension Research, 43, 832-834.
https://doi.org/10.1038/s41440-020-0481-6
[24]  Kuwabara, M., Ae, R., Kosami, K., Kanbay, M., Andres-Hernando, A., Hisatome, I., et al. (2024) Current Updates and Future Perspectives in Uric Acid Research, 2024. Hypertension Research, 48, 867-873.
https://doi.org/10.1038/s41440-024-02031-9

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133