|
基于机器学习的冠心病影响因素分析
|
Abstract:
目的:分析区域性冠心病诱发风险特征因素,探讨冠心病诊断与管理的有效建议。方法:提出基于层次聚类和Fisher评分的双重特征的HC_MFS模型选择方法,采用上海某医院2020年1月至2022年12月的1314例患者数据,以冠心病为例,对其影响因素(C反应蛋白、血小板分布宽度、季节等20个特征)进行分析。结果:HC_MFS方法获得最优性能,最高准确率在随机森林模型中达到83.84%,CRP、LDL、TG、高血压、季节和最低温为重要风险因素,尤其是考虑到样本数据不平衡性并进行处理后,相比其他方法,HC_MFS方法表现更显著,平均准确率提升11.25%,误差最高降低13.16%。结论:崇明区冠心病诱发不仅与CRP、LDL等病理因素强相关,而且还与当地季节气候因素强相关。HC_MFS方法为冠心病分析提供一种新的基于机器学习应用的技术手段,为区域医疗资源建设与健康管理方案制定提供科学决策支持。
Objective: To analyze regional risk factors for coronary heart disease and explore effective recommendations for coronary heart disease (CHD) diagnosis and management. Method: The HC_MFS model selection method based on the dual features of hierarchical clustering and Fisher score is proposed, using the data of 1314 patients from a hospital in Shanghai from January 2020 to December 2022, with coronary heart disease as an example, and the influencing factors (C-reactive protein, platelet distribution width, seasons, etc.) were analyzed. Results: The HC_MFS method obtained optimal performance with the highest accuracy of 83.84% in the random forest model, with CRP, LDL, TG, hypertension, season, and minimum temperature as significant risk factors, especially after considering the sample data imbalance and processing it, the HC_MFS method performed more significantly compared to the other methods, with the average accuracy improved by 11.25% and the error reduced by a maximum of 13.16%. Conclusion: Coronary heart disease induced in Chongming District is not only strongly correlated with CRP, LDL and other pathological factors, but also with local seasonal climatic factors. The HC_MFS method provides a new technical means based on machine learning application for the analysis of coronary heart disease, which can provide scientific decision support for the construction of regional medical resources and the development of health management programs.
[1] | Lindstrom, M., DeCleene, N., Dorsey, H., Fuster, V., Johnson, C.O., LeGrand, K.E., et al. (2022) Global Burden of Cardiovascular Diseases and Risks Collaboration, 1990-2021. Journal of the American College of Cardiology, 80, 2372-2425. https://doi.org/10.1016/j.jacc.2022.11.001 |
[2] | 卢文婷, 姚远, 熊静, 刘香萍, 李双庆. 机器学习在心血管疾病辅助诊断模型中的效果[J]. 中华全科医学, 2023, 21(1): 112-117. |
[3] | 严健亮, 谢泽宇, 景蓉蓉, 等. 基于机器学习利用常规检验指标建立胃癌淋巴结转移预测模型[J]. 实用医学杂志, 2024, 40(6): 844-849. |
[4] | Visseren, F.L.J., Mach, F., Smulders, Y.M., Carballo, D., Koskinas, K.C., Bäck, M., et al. (2021) 2021 ESC Guidelines on Cardiovascular Disease Prevention in Clinical Practice. European Journal of Preventive Cardiology, 29, 5-115. https://doi.org/10.1093/eurjpc/zwab154 |
[5] | Feldman, A.L., Griffin, S.J., Fhärm, E., Norberg, M., Wennberg, P., Weinehall, L., et al. (2017) Screening for Type 2 Diabetes: Do Screen-Detected Cases Fare Better? Diabetologia, 60, 2200-2209. https://doi.org/10.1007/s00125-017-4402-4 |
[6] | Hallan, S.I., Øvrehus, M.A., Romundstad, S., Rifkin, D., Langhammer, A., Stevens, P.E., et al. (2016) Long-Term Trends in the Prevalence of Chronic Kidney Disease and the Influence of Cardiovascular Risk Factors in Norway. Kidney International, 90, 665-673. https://doi.org/10.1016/j.kint.2016.04.012 |
[7] | 许宏洲, 陶言言, 陆国玉, 吴晓飞. 脂蛋白a、纤维蛋白原及C反应蛋白在冠心病早期诊断中的应用[J]. 中华全科医学, 2024, 22(4): 548-550. |
[8] | Altamura, S., Del Pinto, R., Pietropaoli, D. and Ferri, C. (2024) Oral Health as a Modifiable Risk Factor for Cardiovascular Diseases. Trends in Cardiovascular Medicine, 34, 267-275. https://doi.org/10.1016/j.tcm.2023.03.003 |
[9] | Teshale, A.B., Htun, H.L., Hu, J., Dalli, L.L., Lim, M.H., Neves, B.B., et al. (2023) The Relationship between Social Isolation, Social Support, and Loneliness with Cardiovascular Disease and Shared Risk Factors: A Narrative Review. Archives of Gerontology and Geriatrics, 111, Article 105008. https://doi.org/10.1016/j.archger.2023.105008 |
[10] | 刘逸飞, 关素珍, 徐海明, 张娜, 黄敏, 刘志宏. 空气颗粒物对心血管疾病的影响机制研究进展[J]. 中国预防医学杂志, 2023, 24(10): 1118-1123. |
[11] | 申喜凤, 李美婷, 张维宁, 等. 基于多特征融合的医疗社区问题文本聚类研究[J]. 中国数字医学, 2022, 17(12): 28-34. |
[12] | 李欣倩, 杨哲, 任佳. 基于互信息与层次聚类双重特征选择的改进朴素贝叶斯算法[J]. 测控技术, 2022, 41(2): 36-40+69. |
[13] | Hancer, E., Xue, B. and Zhang, M. (2018) Differential Evolution for Filter Feature Selection Based on Information Theory and Feature Ranking. Knowledge-Based Systems, 140, 103-119. https://doi.org/10.1016/j.knosys.2017.10.028 |
[14] | 杨莲, 石宝峰, 迟国泰, 董轶哲. 非均衡数据下基于BPNN-LDAMCE的信用评级模型设计及应用[J]. 数量经济技术经济研究, 2022, 39(3): 152-169. |
[15] | 王鹏焜. 数据分布不均衡场景下时空数据挖掘算法研究[D]: [博士学位论文]. 合肥: 中国科学技术大学, 2023 |
[16] | 李彦榕. 面向不均衡不规则医疗数据的机器学习预测模型的研究[D]: [博士学位论文]. 北京: 北京邮电大学, 2023. |
[17] | Gu, Q., Li, Z. and Han, J. (2012) Generalized Fisher Score for Feature Selection. |
[18] | 马贺, 宋媚, 祝义. 改进边界分类的Borderline-SMOTE过采样方法[J]. 南京大学学报(自然科学), 2023, 59(6): 1003-1012. |
[19] | Ridker, P.M., Everett, B.M., Thuren, T., MacFadyen, J.G., Chang, W.H., Ballantyne, C., et al. (2017) Anti-Inflammatory Therapy with Canakinumab for Atherosclerotic Disease. New England Journal of Medicine, 377, 1119-1131. https://doi.org/10.1056/nejmoa1707914 |
[20] | Sakuma, M., Iimuro, S., Shinozaki, T., Kimura, T., Nakagawa, Y., Ozaki, Y., et al. (2022) Optimal Target of LDL Cholesterol Level for Statin Treatment: Challenges to Monotonic Relationship with Cardiovascular Events. BMC Medicine, 20, Article No. 441. https://doi.org/10.1186/s12916-022-02633-5 |
[21] | Kosmas, C.E., Rodriguez Polanco, S., Bousvarou, M.D., Papakonstantinou, E.J., Peña Genao, E., Guzman, E., et al. (2023) The Triglyceride/high-Density Lipoprotein Cholesterol (TG/HDL-C) Ratio as a Risk Marker for Metabolic Syndrome and Cardiovascular Disease. Diagnostics, 13, Article 929. https://doi.org/10.3390/diagnostics13050929 |
[22] | Peters, A. and Schneider, A. (2020) Cardiovascular Risks of Climate Change. Nature Reviews Cardiology, 18, 1-2. https://doi.org/10.1038/s41569-020-00473-5 |
[23] | Khraishah, H., Alahmad, B., Ostergard, R.L., AlAshqar, A., Albaghdadi, M., Vellanki, N., et al. (2022) Climate Change and Cardiovascular Disease: Implications for Global Health. Nature Reviews Cardiology, 19, 798-812. https://doi.org/10.1038/s41569-022-00720-x |