全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Multi-Task Gaussian Process for Imputing Missing Daily Rainfall Data Using Nearby Stations: Case of Burkina Faso

DOI: 10.4236/jst.2025.151001, PP. 1-13

Keywords: Rainfall, Missing Data, Multi-Task Gaussian Process, Correlation

Full-Text   Cite this paper   Add to My Lib

Abstract:

Precipitation is a critical meteorological factor that significantly impacts agriculture in the sub-Saharan and Sahelian regions of Africa. Accurate knowledge of precipitation levels aids in planning effective agricultural strategies. However, these regions often face challenges with missing rainfall data at numerous gauges. This issue arises due to various factors, including socio-political instability (e.g., terrorism), economic constraints (e.g., insufficient station density due to limited resources), and human factors such as a shortage of qualified personnel. This study evaluates the effectiveness of the multi-task Gaussian process (MTGP) based on the linear model of coregionalization (LMC) for imputing missing daily rainfall data in Burkina Faso, leveraging the correlations among nearby stations. The proposed method is compared with commonly used statistical and machine learning techniques, including mean imputation (ME), K-nearest neighbors (KNN), Multivariate Imputation by Chained Equations (MICE), and Last Observation Carried Forward (LOCF). The results demonstrate that the MTGP approach outperforms MICE, KNN, LOCF, and ME. Additionally, when compared to the independent Gaussian process (IGP), which does not account for correlations between stations, MTGP shows a performance improvement of 50%.

References

[1]  Chiu, P.C., Selamat, A. and Krejcar, O. (2019) Infilling Missing Rainfall and Runoff Data for Sarawak, Malaysia Using Gaussian Mixture Model Based K-Nearest Neighbor Imputation. In: Wotawa, F., Friedrich, G., Pill, I., Koitz-Hristov, R. and Ali, M., Eds., Advances and Trends in Artificial Intelligence. From Theory to Practice, Springer, 27-38.
https://doi.org/10.1007/978-3-030-22999-3_3
[2]  Teegavarapu, R.S.V. (2014) Missing Precipitation Data Estimation Using Optimal Proximity Metric-Based Imputation, Nearest-Neighbour Classification and Cluster-Based Interpolation Methods. Hydrological Sciences Journal, 59, 2009-2026.
https://doi.org/10.1080/02626667.2013.862334
[3]  Lachin, J.M. (2015) Fallacies of Last Observation Carried Forward Analyses. Clinical Trials, 13, 161-168.
https://doi.org/10.1177/1740774515602688
[4]  Azur, M.J., Stuart, E.A., Frangakis, C. and Leaf, P.J. (2011) Multiple Imputation by Chained Equations: What Is It and How Does It Work? International Journal of Methods in Psychiatric Research, 20, 40-49.
https://doi.org/10.1002/mpr.329
[5]  Oyerinde, G.T., Lawin, A.E. and Adeyeri, O.E. (2021) Multi-variate Infilling of Missing Daily Discharge Data on the Niger Basin. Water Practice and Technology, 16, 961-979.
https://doi.org/10.2166/wpt.2021.048
[6]  Nakagawa, S. (2015) Missing data. In: Fox, G.A., Ed., et al., Eds., Ecological Statistics, Oxford University Press, 81-105.
https://doi.org/10.1093/acprof:oso/9780199672547.003.0005
[7]  Sa’adi, Z., Yusop, Z., Alias, N.E., Chow, M.F., Muhammad, M.K.I., Ramli, M.W.A., et al. (2023) Evaluating Imputation Methods for Rainfall Data under High Variability in Johor River Basin, Malaysia. Applied Computing and Geosciences, 20, Article ID: 100145.
https://doi.org/10.1016/j.acags.2023.100145
[8]  Vidal-Paz, J., Rodríguez-Gómez, B.A. and Orosa, J.A. (2023) A Comparison of Different Methods for Rainfall Imputation: A Galician Case Study. Applied Sciences, 13, Article 12260.
https://doi.org/10.3390/app132212260
[9]  Hastie, T., Tibshirani, R. and Friedman, J. (2001) The Elements of Statistical Learning. Springer.
[10]  Bonilla, E.V., Chai, K. and Williams, C. (2007) Multi-Task Gaussian Process Prediction.
https://proceedings.neurips.cc/paper_files/paper/2007/file/66368270ffd51418ec58bd793f2d9b1b-Paper.pdf
[11]  Álvarez, M.A., Rosasco, L. and Lawrence, N.D. (2012) Kernels for Vector-Valued Functions: A Review. Now Foundations and Trends.
https://doi.org/10.1561/9781601985590
[12]  Liu, H., Cai, J. and Ong, Y. (2018) Remarks on Multi-Output Gaussian Process Regression. Knowledge-Based Systems, 144, 102-121.
https://doi.org/10.1016/j.knosys.2017.12.034
[13]  Konomi, B., Karagiannis, G. and Lin, G. (2015) On the Bayesian Treed Multivariate Gaussian Process with Linear Model of Coregionalization. Journal of Statistical Planning and Inference, 157, 1-15.
https://doi.org/10.1016/j.jspi.2014.08.010
[14]  Borchani, H., Varando, G., Bielza, C. and Larrañaga, P. (2015) A Survey on Multi‐output Regression. WIREs Data Mining and Knowledge Discovery, 5, 216-233.
https://doi.org/10.1002/widm.1157
[15]  (2012) GPy: A Gaussian Process Framework in Python. GPy, Sheffield Machine Learning.
http://github.com/SheffieldML/GPy
[16]  de Wolff, T., Cuevas, A. and Tobar, F. (2020) MOGPTK: The Multi-Output Gaussian Process Toolkit. arXiv: 2002.03471.
https://arxiv.org/abs/2002.03471

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133