全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Machine Learning-Based Outlier Detection in Long-Term Climate Data: Evidence from Burkina Faso’s Synoptic Network

DOI: 10.4236/acs.2025.153032, PP. 645-667

Keywords: Machine Learning, Climate Data, Anomaly Detection, Burkina Faso, PyOD

Full-Text   Cite this paper   Add to My Lib

Abstract:

In recent decades, the impact of climate change on natural resources has increased. However, the main challenges associated with the collection of meteorological data include the presence of missing, outlier, or erroneous data. This work focuses on outliers detection in long-term climate data by using machine learning models. The study uses meteorological data collected over 40 years (1981-2021) from ten synoptic stations operated by Burkina Faso’s National Meteorological Agency (ANAM). The methodology is based on the use of 18 machine learning algorithms from the PyOD library, including probabilistic, linear, proximity-based, and ensemble models. Univariate and multivariate analyses are performed. For the multivariate analysis, this paper focuses on two key variables, maximum temperature and minimum relative humidity which consistently exhibit strong correlations across all stations. A robust approach is adopted to optimize the detection of outliers, using thresholds based on extreme percentiles. The results show that models such as KPCA, LSCP, LOF, and Feature Bagging are best suited to capturing anomalies in complex time series. These results will contribute to more reliable climate analyses and improved modeling of extreme climate events in data-scarce regions.

References

[1]  Blázquez-García, S., Conde, A., Mori, U. and Lozano, J.A. (2021) A Review on Outlier/Anomaly Detection in Time Series Data. ACM Computing Surveys, 54, 1-36.
https://doi.org/10.1145/3444690
[2]  Choi, K., Yi, J., Park, C. and Yoon, S. (2021) Deep Learning for Anomaly Detection in Time-Series Data: Review, Analysis, and Guidelines. IEEE Access, 9, 120043-120065.
https://doi.org/10.1109/ACCESS.2021.3107975
[3]  Schmidl, S., Wenig, P. and Papenbrock, T. (2022) Anomaly Detection in Time Series: A Comprehensive Evaluation. Proceedings of the VLDB Endowment, 15, 1779-1797.
https://doi.org/10.14778/3538598.3538602
[4]  Srinivasan, R., Wang, L. and Bulleid, J.L. (2020) Machine Learning-Based Climate Time Series Anomaly Detection Using Convolutional Neural Networks. Weather and Climate, 40, 16-31.
https://doi.org/10.2307/27031377
[5]  Wu, R. and Keogh, E.J. (2021) Current Time Series Anomaly Detection Benchmarks Are Flawed and Are Creating the Illusion of Progress. IEEE Transactions on Knowledge and Data Engineering, 35, 2421-2429.
https://doi.org/10.1109/ICDE53745.2022.00116
[6]  Wahyono, T., Heryadi, Y., Soeparno, H. and Abbas, B.S. (2020) Anomaly Detection in Climate Data Using Stacked and Densely Connected Long Short-Term Memory Model. Journal of Computers, 31, 42-53.
https://doi.org/10.3966/199115992020083104004
[7]  Bâra, A., Văduva, A.G. and Oprea, S.V. (2024) Anomaly Detection in Weather Phenomena: News and Numerical Data-Driven Insights into the Climate Change in Romania’s Historical Regions. International Journal of Computational Intelligence Systems, 17, Article No. 134.
https://doi.org/10.1007/s44196-024-00536-2
[8]  Shen, J., Yang, M., Zou, B., Wan, N. and Liao, Y. (2012) Outlier Detection of Air Temperature Series Data Using Probabilistic Finite State Automata-Based Algorithm. Complexity, 17, 48-57.
https://doi.org/10.1002/cplx.21390
[9]  Esmaeili, F., Cassie, E., Nguyen, H.P.T., Plank, N.O.V., Unsworth, C.P. and Wang, A. (2023) Anomaly Detection for Sensor Signals Utilizing Deep Learning Autoencoder-Based Neural Networks. Bioengineering (Basel, Switzerland), 10, Article 405.
https://doi.org/10.3390/bioengineering10040405
[10]  Tinawi, I. (2019) Machine Learning for Time Series Anomaly Detection. Doctoral Dissertation, Massachusetts Institute of Technology.
[11]  Han, S., Hu, X., Huang, H., Jiang, M. and Zhao, Y. (2022) Adbench: Anomaly Detection Benchmark. SSRN Electronic Journal.
https://doi.org/10.2139/ssrn.4266498
[12]  Kawale, J., Chatterjee, S., Kumar, A., Liess, S., Steinbach, M. and Kumar, V. (2011) Anomaly Construction in Climate Data: Issues and Challenges. Proceedings of the 2011 Conference on Intelligent Data Understanding, California, 19-21 October 2011.
[13]  Zhao, Y., Nasrullah, Z. and Li, Z. (2019) PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of Machine Learning Research, 20, 1-7.
https://github.com/yzhao062/pyod.
http://jmlr.org/papers/v20/19-011.html
[14]  Chen, S., Qian, Z., Siu, W., Hu, X., Li, J., Li, S. and Zhao, Y. (2024) PyOD 2: A Python Library for Outlier Detection with LLM-powered Model Selection.
[15]  Gregoire, T., Ayala Solares, H.A., Coutu, S., Cowen, D., DeLaunay, J.J., Fox, D.B., Keivani, A., Krauss, F., Mostafá, M., Murase, K., Neights, E. and Turley, C.F. (2021) Model Independent Search for Transient Multimessenger Events with AMON Using Outlier Detection Methods. 37th International Cosmic Ray Conference, Berlin, 15-22 July 2021, 934.
[16]  Li, Y., Zha, D., Venugopal, P., Zou, N. and Hu, X. (2020) PyODDS: An End-to-End Outlier Detection System with Automated Machine Learning. Companion Proceedings of the Web Conference 2020, Taipei, 20-24 April 2020, 153-157.
[17]  Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining (ICDM), Pisa, 15-19 December 2008, 413-422.
https://doi.org/10.1109/ICDM.2008.17
[18]  Lai, K.H., Zha, D., Xu, J., Zhao, Y., Wang, G. and Hu, X. (2021) Revisiting Time Series Outlier Detection: Definitions and Benchmarks.
[19]  Teixeira, C.F. (2024) Outlier Explanations in Data Streams-Applications for Environmental Data. Master’s Thesis, Universidade do Porto (Portugal).
[20]  Dembélé, M. and Zwart, S.J. (2016) Evaluation and Comparison of Satellite-Based Rainfall Products in Burkina Faso, West Africa. International Journal of Remote Sensing, 37, 3995-4014.
https://doi.org/10.1080/01431161.2016.1207258
[21]  Ki, Z.G. (2020) Données Climatiques: Analyses de corrélations, de régressions et prédiction de données manquantes. Master degree thesis, Université Josep KI-ZERBO (Burkina Faso).
[22]  Hariri, S., Kind, M.C. and Brunner, R.J. (2021) Extended Isolation Forest. IEEE Transactions on Knowledge and Data Engineering, 33, 1479-1489.
https://doi.org/10.1109/tkde.2019.2947676
[23]  Wolpher, M. (2018) Anomaly Detection in Unstructured Time Series Datausing an LSTM Autoencoder. Master of Science, Engineering Physics in the School of Electrical Engineering and Computer Science, Kth Royal Institute of Technology (Sweden).
[24]  Pang, G., Shen, C., Cao, L. and Hengel, A.V.D. (2021) Deep Learning for Anomaly Detection: A Review. ACM Computing Surveys, 54, 1-38.
https://doi.org/10.1145/3439950
[25]  Chauhan, S. and Vig, L. (2015) Anomaly Detection in ECG Time Signals via Deep Long Short-Term Memory Networks. 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Paris, 19-21 October 2015, 1-7.
https://doi.org/10.1109/dsaa.2015.7344872
[26]  Hodge, V.J. and Austin, J. (2004) A Survey of Outlier Detection Methodologies. Artificial Intelligence Review, 22, 85-126.
https://doi.org/10.1023/B:AIRE.0000045502.10941.a9
[27]  Plotly Technologies Inc. (2015) Collaborative Data Science. Plotly Technologies Inc.
https://plotly.com/python

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133