Modeling of Total Dissolved Solids (TDS) and Sodium Absorption Ratio (SAR) in the Edwards-Trinity Plateau and Ogallala Aquifers in the Midland-Odessa Region Using Random Forest Regression and eXtreme Gradient Boosting
Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. The above statement holds for West Texas, Midland, and Odessa Precisely. Two machine learning regression algorithms (Random Forest and XGBoost) were employed to develop models for the prediction of total dissolved solids (TDS) and sodium absorption ratio (SAR) for efficient water quality monitoring of two vital aquifers: Edward-Trinity (plateau), and Ogallala aquifers. These two aquifers have contributed immensely to providing water for different uses ranging from domestic, agricultural, industrial, etc. The data was obtained from the Texas Water Development Board (TWDB). The XGBoost and Random Forest models used in this study gave an accurate prediction of observed data (TDS and SAR) for both the Edward-Trinity (plateau) and Ogallala aquifers with the R2 values consistently greater than 0.83. The Random Forest model gave a better prediction of TDS and SAR concentration with an average R, MAE, RMSE and MSE of 0.977, 0.015, 0.029 and 0.00, respectively. For the XGBoost, an average R, MAE, RMSE, and MSE of 0.953, 0.016, 0.037 and 0.00, respectively, were achieved. The overall performance of the models produced was impressive. From this study, we can clearly understand that Random Forest and XGBoost are appropriate for water quality prediction and monitoring in an area of high hydrocarbon activities like Midland and Odessa and West Texas at large.
References
[1]
Asadollah, H. S. B., Sharafati, A., Motta, D., & Yaseen, Z. (2020). River Water Quality Index Prediction and Uncertainty Analysis: A Comparative Study of Machine Learning Models. Journal of Environmental Chemical Engineering, 9, Article ID: 104599. https://doi.org/10.1016/j.jece.2020.104599
[2]
Atta, U. R., Khairullah, K., Wahab, K., Aurangzeb, K., & Saqia, B. (2018). Unsupervised Machine Learning Based Documents Clustering in Urdu. EAI Endorsed Transactions on Scalable Information Systems, 5, e5. https://doi.org/10.4108/eai.19-12-2018.156081
[3]
Baddour, D. (2022). To Ease Looming West Texas Water Shortage, Oil Companies Have Begun Recycling Fracking Wastewater. Inside the Texas Tribune. https://www.texastribune.org/2022/12/19/texas-permian-basin-fracking-oil-wastewater-recycling/
[4]
Barker, R. A., & Ardis, A. F. (1992). Hydrogeologic Framework of the Edward-Trinity Aquifer System, West Central Texas. U.S Geological Survey Professional Paper 1421-B.
[5]
Biau, G., & Scornet, E. (2016). Rejoinder on a Random Forest Guided Tour. Test, 25, 264-268. https://doi.org/10.1007/s11749-016-0488-0
[6]
Blaine, H., Grattan, S. R., & Fulton, A. (1993). Agricultural Salinity and Drainage: A Handbook for Water Managers. University of California.
[7]
Blandford, T. N., & Blazer, D. J. (2004). Hydrologic Relationships and Numerical Simulations of the Exchange of Water between the Southern Ogallala and Edwards-Trinity Aquifers in Southwest Texas (pp. 115-132). Aquifers of the Edwards Plateau: Texas Water Development Board Report 360.
[8]
Chen, T., Zhang, H., Sun, C., Li, H., & Gao, Y. (2018). Multivariate Statistical Approaches to Identify the Major Factors Governing Groundwater Quality. Applied Water Science, 8, Article No. 215. https://doi.org/10.1007/s13201-018-0837-0
[9]
Dong, J., Zeng, W., Wu, L., Huang, J., Gaiser, T., & Srivastava, A. K. (2023). Enhancing Short-Term Forecasting of Daily Precipitation Using Numerical Weather Prediction Bias Correcting with XGBoost in Different Regions of China. Engineering Applications of Artificial Intelligence, 117, Article ID: 105579. https://doi.org/10.1016/j.engappai.2022.105579
[10]
Elsayed, S., Ibrahim, H., Hussein, H., Elsherbiny, O., Elmetwalli, A. H., & Moghanm, F. S. (2021). Assessment of Water Quality in Lake Qaroun Using Ground-Based Remote Sensing Data and Artificial Neural Networks. Water, 13, Article No. 3094. https://doi.org/10.3390/w13213094
[11]
Emami, S., & Parsa, J. (2020). Comparative Evaluation of Imperialist Competitive Algorithm and Artificial Neural Networks for Estimation of Reservoirs Storage Capacity. Applied Water Science, 10, Article No. 177. https://doi.org/10.1007/s13201-020-01259-3
[12]
George, P., Mace, R., & Petrossian, R. (2011). Aquifers of Texas. Texas Water Development Board, Austin.
[13]
Ghosh, A., Das, P., & Sinha, K. (2015). Modeling of Biosorption of Cu(II) by Alkali-Modified Spent Tea Leaves Using Response Surface Methodology (RSM) and Artificial Neural Network (ANN). Applied Water Science, 5, 191-199. https://doi.org/10.1007/s13201-014-0180-z
[14]
Heo, J., Yu, J., Giardino, J. R., & Cho, H. (2015). Water Resources Response to Climate and Land-Cover Changes in a Semi-Arid Watershed, New Mexico, USA. Terrestrial, Atmospheric and Oceanic Sciences, 26, 463-474. https://doi.org/10.3319/TAO.2015.03.24.01(Hy)
[15]
Kiangala, S. K., & Wang, Z. (2021). An Effective Adaptive Customization Framework for Small Manufacturing Plants Using Extreme Gradient Boosting-XGBoost and Random Forest Ensemble Learning Algorithms in an Industry 4.0 Environment. Machine Learning with Applications, 4, Article ID: 100024. https://doi.org/10.1016/j.mlwa.2021.100024
[16]
Kimmel, T. M., Nielsen-Gammon, J., Rose, B., & Mogil, H. M. (2016). The Weather and Climate of Texas: A Big State with Big Extremes. Weatherwise, 69, 25-33. https://doi.org/10.1080/00431672.2016.1206446
[17]
Kulisz, M., Kujawska, J., Przysucha, B., & Cel, W. (2021). Forecasting Water Quality Index in Groundwater Using Artificial Neural Network. Energies, 14, Article No. 5875. https://doi.org/10.3390/en14185875
[18]
Li, J., Liu, H., & Paul Chen, J. (2018). Microplastics in Freshwater Systems: A Review on Occurrence, Environmental Effects, and Methods for Microplastics Detection. Water Research, 137, 362-374. https://doi.org/10.1016/j.watres.2017.12.056
[19]
Meshram, S. G., Safari, M. J. S., Khosravi, K., & Meshram, C. (2020). Iterative Classifier Optimizer-Based Pace Regression and Random Forest Hybrid Models for Suspended. Environmental Science and Pollution Research International, 28, 11637-11649. https://doi.org/10.1007/s11356-020-11335-5
[20]
Michael, A. M. (2008). Water Wells & Pumps. Tata McGraw-Hill Education.
[21]
Mohd Zebaral Hoque, J., Ab Aziz, N. A., Alelyani, S., Mohana, M., & Hosain, M. (2022). Improving Water Quality Index Prediction Using Regression Learning Models. International Journal of Environmental Research and Public Health, 19, Article No. 13702. https://doi.org/10.3390/ijerph192013702
[22]
Nguyen Khoi, D., Nguyen, Q., Do, L., Thi Thao Nhi, P., & Thuy, N. T. (2022). Using Machine Learning Models for Predicting the Water Quality Index in the La Buong River, Vietnam. Water, 14, Article No. 1552. https://doi.org/10.3390/w14101552
[23]
Othman, A. H. A., Kassim, S., Rosman, R. B., & Redzuan, N. H. B. (2020). Prediction Accuracy Improvement for Bitcoin Market Prices Based on Symmetric Volatility Information Using Artificial Neural Network Approach. Journal of Revenue and Pricing Management, 19, 314-330. https://doi.org/10.1057/s41272-020-00229-3
[24]
Pan, C., Ng, K. T. W., Fallah, B., & Richter, A. (2019). Evaluation of the Bias and Precision of Regression Techniques and Machine Learning Approaches in Total Dissolved Solids Modeling of an Urban Aquifer. Environmental Science and Pollution Research, 26, 1821-1833. https://doi.org/10.1007/s11356-018-3751-y
[25]
Patel, A., Arora, G. S., Roknsharifi, M., Kaur, P., & Javed, H. (2023). Artificial Intelligence in the Detection of Barrett’s Esophagus: A Systematic Review. Cureu, 15, e47755. https://doi.org/10.7759/cureus.47755
[26]
Segal, M. R. (2004). Machine Learning Benchmarks and Random Forest Regression. Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco. https://escholarship.org/uc/item/35x3v9t4
[27]
Sepahvand, A., Singh, B., Sihag, P., Samani, A. N., Ahmadi, H., & Nia, S. F. (2021) Assessment of the Various Soft Computing Techniques to Predict Sodium Absorption Ratio (SAR). ISH Journal of Hydraulic Engineering, 27, 124-135. https://doi.org/10.1080/09715010.2019.1595185
[28]
Shaikh, M. A. H., & Barbé, K. (2021). Study of Random Forest to Identify Wiener-Hammerstein System. IEEE Transactions on Instrumentation and Measurement, 70, 1-12. https://doi.org/10.1109/TIM.2020.3018840
[29]
Sharma, S., & Bhattacharya, A. (2017). Drinking Water Contamination and Treatment Techniques. Applied Water Science, 7, 1043-1067. https://doi.org/10.1007/s13201-016-0455-7
[30]
Sposito, G., & Mattigod, S. V. (1977). On the Chemical Foundation of the Sodium Adsorption Ratio. Soil Science Society of America Journal, 41, 323-329. https://doi.org/10.2136/sssaj1977.03615995004100020030x
[31]
Suen, J.-P., & Eheart, J. W. (2003). Evaluation of Neural Networks for Modeling Nitrate Concentrations in Rivers. Journal of Water Resources Planning and Management, 129, 505-510. https://doi.org/10.1061/(ASCE)0733-9496(2003)129:6(505)
[32]
Sulthonuddin, I., Harton, D. M., & Utomo, S. W. (2018). Water Quality Assessment of Cimanuk River in West Java Using Pollution Index. E3S Web of Conferences, 68,Article No. 04009. https://doi.org/10.1051/e3sconf/20186804009
[33]
Sun, L., & Gui, H. (2015). Hydro-Chemical Evolution of Groundwater and Mixing Between Aquifers: A Statistical Approach Based on Major Ions. Applied Water Science, 5, 97-104. https://doi.org/10.1007/s13201-014-0169-7
[34]
UNEP (2019). Emissions Gap Report. https://www.unep.org/resources/emissions-gap-report-2019
[35]
Wang, F. E. et al. (2021). Spatial Heterogeneity Modeling of Water Quality Based on Random Forest Regression and Model Interpretation. EnvironmentalResearch, 202, Article ID: 111660. https://www.sciencedirect.com/science/article/abs/pii/S0013935121009543?via=ihub