|
基于SARIMA-LSTM模型的中国肺结核传染病预测研究
|
Abstract:
背景:中国是结核病高负担国家之一,尽管肺结核新发病例数逐年下降,但每年新增感染者的数量一直处于较高水平且肺结核感染者的诊断率较低。目的:选择更精准的预测肺结核的发病情况模型,为肺结核的防控和预警提供科学依据。方法:建立SARIMA和LSTM模型,运用加权组合的方法构建SARIMA-LSTM组合模型,使用平均绝对误差(MAE)、均方根误差(RMSE)和平均绝对误差百分比(MAPE)三个评价指标比较模型的预测性能,确定最佳预测模型,并使用该模型对肺结核发病趋势进行预测。结果:SARIMA模型、LSTM模型和SARIMA-LSTM组合模型的平均绝对百分比误差(MAPE)分别为17.95、14.62、8.49,组合模型的MAPE比SARIMA模型降低了52.70%,比LSTM模型降低了41.89%。结论:SARIMA-LSTM组合模型的拟合效果更好,预测误差在三个模型中最低。该组合模型能发挥单一模型的优势,相比两种单一模型提升了预测的准确性。
Background: China is one of the countries with a high burden of Tuberculosis (TB). Although the number of new cases of TB has been decreasing year by year, the number of newly infected people each year has been at a high level and the diagnosis rate of TB-infected people is low. Objective: To select a more accurate model to predict the incidence of tuberculosis and provide a scientific basis for the prevention and control of tuberculosis and early warning. Methods: The SARIMA and LSTM models were established, and the SARMIA-LSTM combined model was constructed by the weighted combination method. The prediction performance of the model was compared by the three evaluation indexes of Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE), and the optimal prediction model was determined. The model was used to predict the trend of tuberculosis incidence. Results: The Mean Absolute Percentage Error (MAPE) of SARIMA model, LSTM model and SARMI-LSTM combined model were 17.95, 14.62 and 8.49, respectively. The MAPE of the combined model was reduced by 52.70% compared with SARIMA model and 41.89% compared with LSTM model. Conclusion: SARMIA-LSTM combined model has a better fitting effect, and the prediction error is the lowest among the three models. The combined model can give full play to the advantages of a single model and improve the accuracy of prediction compared with the two single models.
[1] | Dheda, K., Barry, C.E. and Maartens, G. (2016) Tuberculosis. The Lancet, 387, 1211-1226. https://doi.org/10.1016/s0140-6736(15)00151-8 |
[2] | World Health Organization (2023) Global Tuberculosis Report 2023. World Health Organization. |
[3] | World Health Organization (2015) Implementing the End TB Strategy. World Health Organization. |
[4] | 徐晓岭, 王磊. 统计学[M]. 北京: 人民邮电出版社, 2015. |
[5] | Ariff, M.R.A., Rafdzah, Z.A., Rozita, W.M.W., et al. (2023) Forecasting New Tuberculosis Cases in Malaysia: A Time-Series Study Using the Autoregressive Integrated Moving Average (ARIMA) Model. Cureus, 15, e44676. |
[6] | Munshi, R.M., Khayyat, M.M., Ben Slama, S. and Khayyat, M.M. (2024) A Deep Learning-Based Approach for Predicting COVID-19 Diagnosis. Heliyon, 10, e28031. https://doi.org/10.1016/j.heliyon.2024.e28031 |
[7] | Hong, S., Woo, S., Kim, S., Park, J., Lee, M., Kim, S., et al. (2024) National Prevalence of Smoking among Adolescents at Tobacco Tax Increase and COVID-19 Pandemic in South Korea, 2005-2022. Scientific Reports, 14, Article No. 7823. https://doi.org/10.1038/s41598-024-58446-4 |
[8] | Wang, Y., Wang, L., Ma, W., Zhao, H., Han, X. and Zhao, X. (2024) Development of a Novel Dynamic Nosocomial Infection Risk Management Method for COVID-19 in Outpatient Settings. BMC Infectious Diseases, 24, Article No. 214. https://doi.org/10.1186/s12879-024-09058-w |
[9] | Wan, Y., Song, P., Liu, J., Xu, X. and Lei, X. (2023) A Hybrid Model for Hand-Foot-Mouth Disease Prediction Based on ARI-MA-EEMD-LSTM. BMC Infectious Diseases, 23, Article No. 879. https://doi.org/10.1186/s12879-023-08864-y |
[10] | Meng, P., Huang, J. and Kong, D. (2022) Prediction of Incidence Trend of Influenza-Like Illness in Wuhan Based on ARIMA Model. Computational and Mathematical Methods in Medicine, 2022, Article ID: 6322350. https://doi.org/10.1155/2022/6322350 |
[11] | Yang, Y., Guo, C., Liu, L., Zhang, T. and Liu, W. (2016) Seasonality Impact on the Transmission Dynamics of Tuberculosis. Computational and Mathematical Methods in Medicine, 2016, Article ID: 8713924. https://doi.org/10.1155/2016/8713924 |
[12] | Zhu, H., Chen, S., Liang, R., Feng, Y., Joldosh, A., Xie, Z., et al. (2023) Study of the Influence of Meteorological Factors on HFMD and Prediction Based on the LSTM Algorithm in Fuzhou, China. BMC Infectious Diseases, 23, Article No. 299. https://doi.org/10.1186/s12879-023-08184-1 |
[13] | Yadav, B.K., Srivastava, S.K., Arasu, P.T. and Singh, P. (2023) Time Series Modeling of Tuberculosis Cases in India from 2017 to 2022 Based on the SARIMA-NNAR Hybrid Model. Canadian Journal of Infectious Diseases and Medical Microbiology, 2023, Article ID: 5934552. https://doi.org/10.1155/2023/5934552 |
[14] | Hayat, C. and Soenandi, I.A. (2018) The Hybrid-Model Architectural Modelling Based on ARIMA-BPNN Methods for Building Materials Demands Forecasting. MATEC Web of Conferences, 204, Article No. 02003. https://doi.org/10.1051/matecconf/201820402003 |
[15] | http://www.chinacdc.cn |
[16] | Box, G.E.P., Jenkins, G.M., Reinsel, G.C. and Ljung, G.M. (2015) Time Series Analysis: Forecasting and Control. 5th Edition, Wiley. |
[17] | Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735 |
[18] | Zhao, Z.Y., Zhai, M.M., Li, G.H., Gao, X., Song, W., Wang, X., et al. (2023) Study on the Prediction Effect of a Combined Model of SARIMA and LSTM Based on SSA for Influenza in Shanxi Province, China. BMC Infectious Diseases, 23, Article No. 71. https://doi.org/10.1186/s12879-023-08025-1 |