Spatial heterogeneity refers to the variation or differences in characteristics or features across different locations or areas in space. Spatial data refers to information that explicitly or indirectly belongs to a particular geographic region or location, also known as geo-spatial data or geographic information. Focusing on spatial heterogeneity, we present a hybrid machine learning model combining two competitive algorithms: the Random Forest Regressor and CNN. The model is fine-tuned using cross validation for hyper-parameter adjustment and performance evaluation, ensuring robustness and generalization. Our approach integrates Global Moran’s I for examining global autocorrelation, and local Moran’s I for assessing local spatial autocorrelation in the residuals. To validate our approach, we implemented the hybrid model on a real-world dataset and compared its performance with that of the traditional machine learning models. Results indicate superior performance with an R-squared of 0.90, outperforming RF 0.84 and CNN 0.74. This study contributed to a detailed understanding of spatial variations in data considering the geographical information (Longitude & Latitude) present in the dataset. Our results, also assessed using the Root Mean Squared Error (RMSE), indicated that the hybrid yielded lower errors, showing a deviation of 53.65% from the RF model and 63.24% from the CNN model. Additionally, the global Moran’s I index was observed to be 0.10. This study underscores that the hybrid was able to predict correctly the house prices both in clusters and in dispersed areas.
References
[1]
Goodchild, M.F. (2013) The Quality of Big (Geo) Data. Dialogues in Human Geography, 3, 280-284. https://doi.org/10.1177/2043820613513392
[2]
Gaspard, G., Kim, D. and Chun, Y. (2019) Residual Spatial Autocorrelation in Macroecological and Biogeographical Modeling: A Review. Journal of Ecology and Environment, 43, Article No. 19. https://doi.org/10.1186/s41610-019-0118-3
[3]
Shekhar, S., Zhang, P. and Huang, Y. (2010) Spatial Data Mining. In: Maimon, O. and Rokach, L., Eds., Data Mining and Knowledge Discovery Handbook, Springer, Berlin, 837-854. https://doi.org/10.1007/978-0-387-09823-4_43
[4]
Dutilleul, P. and Legendre, P. (1993) Spatial Heterogeneity against Heteroscedasticity: An Ecological Paradigm versus a Statistical Concept. Oikos, 66, 152-171. https://doi.org/10.2307/3545210
[5]
Brenning, A. (2005) Spatial Prediction Models for Landslide Hazards: Review, Comparison and Evaluation. Natural Hazards and Earth System Sciences, 5, 853-862. https://doi.org/10.5194/nhess-5-853-2005
[6]
Aguilar, R., Zurita-Milla, R., Izquierdo-Verdiguier, E. and De By, R.A. (2018) A Cloud-Based Multi-Temporal Ensemble Classifier to Map Smallholder Farming Systems. Remote Sensing, 10, Article No. 729. https://doi.org/10.3390/rs10050729
[7]
Pradhan, A.M.S. and Kim, Y.-T. (2020) Rainfall-Induced Shallow Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms. ISPRS International Journal of Geo-Information, 9, Article No. 569. https://doi.org/10.3390/ijgi9100569
[8]
Zurita-Milla, R., Goncalves, R., Izquierdo-Verdiguier, E. and Ostermann, F.O. (2019) Exploring Spring Onset at Continental Scales: Mapping Phenoregions and Correlating Temperature and Satellite-Based Phenometrics. IEEE Transactions on Big Data, 6, 583-593. https://doi.org/10.1109/TBDATA.2019.2926292
[9]
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N. and Prabhat, F. (2019) Deep Learning and Process Understanding for Data-Driven Earth System Science. Nature, 566, 195-204. https://doi.org/10.1038/s41586-019-0912-1
[10]
Shekhar, S., Jiang, Z., Ali, R.Y., Eftelioglu, E., Tang, X., Gunturi, V.M.V. and Zhou, X. (2015) Spatiotemporal Data Mining: A Computational Perspective. ISPRS International Journal of Geo-Information, 4, 2306-2338. https://doi.org/10.3390/ijgi4042306
[11]
Nwaila, G.T., Zhang, S.E., Bourdeau, J.E., Frimmel, H.E. and Ghorbani, Y. (2024) Spatial Interpolation Using Machine Learning: from Patterns and Regularities to Block Models. Natural Resources Research, 33, 129-161. https://doi.org/10.1007/s11053-023-10280-7
[12]
Wang, Z., Shi, W.J., Zhou, W., Li, X.Y. and Yue, T.X. (2020) Comparison of Additive and Isometric Log-Ratio Transformations Combined with Machine Learning and Regression Kriging Models for Mapping Soil Particle Size Fractions. Geoderma, 365, Article ID: 114214. https://doi.org/10.1016/j.geoderma.2020.114214
[13]
Pereira, G.W., et al. (2022) Smart-Map: An Open-Source QGIS Plugin for Digital Mapping Using Machine Learning Techniques and Ordinary Kriging. Agronomy, 12, Article No. 1350. https://doi.org/10.3390/agronomy12061350
[14]
Hengl, T., Nussbaum, M., Wright, M.N., Heuvelink, G.B.M. and Graler, B. (2018) Random Forest as a Generic Framework for Predictive Modeling of Spatial and Spatio-Temporal Variables. PeerJ, 6, e5518. https://doi.org/10.7717/peerj.5518
[15]
Behrens, T., Rossel, R.A.V., Kerry, R, MacMillan, R., Schmidt, K., Lee, J., Scholten, T. and Zhu, A.-X. (2019) The Relevant Range of Scales for Multi-Scale Contextual Spatial Modelling. Scientific Reports, 9, Article No. 14800. https://doi.org/10.1038/s41598-019-51395-3
[16]
Georganos, S., Grippa, T., Gadiaga, A.N., Linard, C., Lennert, M., Vanhuysse, S., Mboga, N., Wolff, E. and Kalogirou, S. (2021) Geographical Random Forests: A Spatial Extension of the Random Forest Algorithm to Address Spatial Heterogeneity in Remote Sensing and Population Modelling. Geocarto International, 36, 121-136. https://doi.org/10.1080/10106049.2019.1595177
[17]
Meyer, H., Reudenbach, C., Wollauer, S. and Nauss, T. (2019) Importance of Spatial Predictor Variable Selection in Machine Learning Applications-Moving from Data Reproduction to Spatial Prediction. Ecological Modelling, 411, Article ID: 108815. https://doi.org/10.1016/j.ecolmodel.2019.108815
[18]
Behrens, T. and Rossel, R.A.V. (2020) On the Interpretability of Predictors in Spatial Data Science: The Information Horizon. Scientific Reports, 10, Article No. 16737. https://doi.org/10.1038/s41598-020-73773-y
[19]
Chen, L., Ren, C.Y., Li, L., Wang, Y.Q., Zhang, B., Wang, Z.M. and Li, L.F. (2019) A Comparative Assessment of Geostatistical, Machine Learning, and Hybrid Approaches for Mapping Topsoil Organic Carbon Content. ISPRS International Journal of Geo-Information, 8, Article No. 174. https://doi.org/10.3390/ijgi8040174
[20]
Behrens, T., Schmidt, K., Rossel, R.A.V., Gries, P., Scholten, T. and MacMillan, R.A. (2018) Spatial Modelling with Euclidean Distance Fields and Machine Learning. European Journal of Soil Science, 69, 757-770. https://doi.org/10.1111/ejss.12687
[21]
Quinones, S., Goyal, A. and Ahmed, Z.U. (2021) Geographically Weighted Machine Learning Model for Untangling Spatial Heterogeneity of Type 2 Diabetes Mellitus (T2D) Prevalence in the USA. Scientific Reports, 11, Article No. 6955. https://doi.org/10.1038/s41598-021-85381-5
[22]
Liu, X.J., Kounadi, O. and Zurita-Milla, R. (2022) Incorporating Spatial Autocorrelation in Machine Learning Models Using Spatial Lag and Eigenvector Spatial Filtering Features. ISPRS International Journal of Geo-Information, 11, Article No. 242. https://doi.org/10.3390/ijgi11040242
[23]
Khaki, S., Wang, L.Z. and Archontoulis, S.V. (2020) A CNN-RNN Framework for Crop Yield Prediction. Frontiers in Plant Science, 10, Article ID: 492736. https://doi.org/10.3389/fpls.2019.01750
[24]
Yu, W.T., Li, J., Liu, Q.H., Zhao, J., Dong, Y.D., Wang, C., Lin, S.R., Zhu, X.R. and Zhang, H. (2021) Spatial-Temporal Prediction of Vegetation Index with Deep Recurrent Neural Networks. IEEE Geoscience and Remote Sensing Letters, 19, 1-5. https://doi.org/10.1109/LGRS.2021.3064814
[25]
Xu, L., Cai, R.N., Yu, H.C., Du, W.Y., Chen, Z.Q. and Chen, N.C. (2024) Monthly NDVI Prediction Using Spatial Autocorrelation and Nonlocal Attention Networks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17, 3425-3437. https://doi.org/10.1109/JSTARS.2024.3350053
[26]
Deng, M., Yang, W.T. and Liu, Q.L. (2017) Geographically Weighted Extreme Learning Machine: A Method for Space-Time Prediction. Geographical Analysis, 49, 433-450. https://doi.org/10.1111/gean.12127
[27]
Deng, M., Yang, W.T., Liu, Q.L., Jin, R., Xu, F. and Zhang, Y.F. (2018) Heterogeneous Space-Time Artificial Neural Networks for Space-Time Series Prediction. Transactions in GIS, 22, 183-201. https://doi.org/10.1111/tgis.12302
[28]
Wang, Y.M., Feng, L.W., Li, S.J., Ren, F. and Du, Q.Y. (2020) A Hybrid Model Considering Spatial Heterogeneity for Landslide Susceptibility Mapping in Zhejiang Province, China. Catena, 188, Article ID: 104425. https://doi.org/10.1016/j.catena.2019.104425
[29]
Almulihi, A., Saleh, H., Hussien, A.M., Mostafa, S., El-Sappagh, S., Alnowaiser, K., et al. (2022) Ensemble Learning Based on Hybrid Deep Learning Model for Heart Disease Early Prediction. Diagnostics, 12, Article No. 3215. https://doi.org/10.3390/diagnostics12123215
[30]
Zeng, H.R., Zhang, B. and Wang, H.J. (2023) A Hybrid Modeling Approach Considering Spatial Heterogeneity and Nonlinearity to Discover the Transition Rules of Urban Cellular Automata Models. Environment and Planning B: Urban Analytics and City Science, 50, 1898-1915. https://doi.org/10.1177/23998083221149018
[31]
Zhao, Z.X., Wu, J.R., Cai, F.J., Zhang, S.T. and Wang, Y.-G. (2023) A Hybrid Deep Learning Framework for Air Quality Prediction with Spatial Autocorrelation during the COVID-19 Pandemic. Scientific Reports, 13, Article No. 1015. https://doi.org/10.1038/s41598-023-28287-8
[32]
Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M. and Chica-Rivas, M. (2015) Machine Learning Predictive Models for Mineral Prospectivity: An Evaluation of Neural Networks, Random Forest, Regression Trees and Support Vector Machines. Ore Geology Reviews, 71, 804-818. https://doi.org/10.1016/j.oregeorev.2015.01.001
[33]
Li, J., Heap, A.D., Potter, A. and Daniell, J.J. (2011) Application of Machine Learning Methods to Spatial Interpolation of Environmental Variables. Environmental Modelling & Software, 26, 1647-1659. https://doi.org/10.1016/j.envsoft.2011.07.004
[34]
Lee, H., Kim, J., Jung, S., Kim, M. and Kim, S. (2019) Case Dependent Feature Selection Using Mean Decrease Accuracy for Convective Storm Identification. 2019 IEEE International Conference on Fuzzy Theory and Its Applications (IFUZZY), New Taipei, 7-10 November 2019, 1-4.
[35]
Zhu, Y.T., Brettin, T., Xia, F.F., Partin, A., Shukla, M., Yoo, H., Evrard, Y.A., Doroshow, J.H. and Stevens, R. (2021) Converting Tabular Data into Images for Deep Learning with Convolutional Neural Networks. Scientific Reports, 11, Article No. 11325. https://doi.org/10.1038/s41598-021-90923-y
[36]
Liu, X., Wang, X.G. and Matwin, S. (2018) Improving the Interpretability of Deep Neural Networks with Knowledge Distillation. 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17-20 November 2018, 905-912. https://doi.org/10.1109/ICDMW.2018.00132
[37]
Kavitha, M., Gnaneswar, G., Dinesh, R., Rohith Sai, Y. and Sai Suraj, R. (2021) Heart Disease Prediction Using Hybrid Machine Learning Model. 2021 IEEE 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, 20-22 January 2021, 1329-1333. https://doi.org/10.1109/ICICT50816.2021.9358597
[38]
Taufiqurrahman, A., Putrada, A.G. and Dawani, F. (2020) Decision Tree Regression with Adaboost Ensemble Learning for Water Temperature Forecasting in Aquaponic Ecosystem. 2020 IEEE 6th International Conference on Interactive Digital Media (ICIDM), 14-15 December 2020, 1-5. https://doi.org/10.1109/ICIDM51048.2020.9339669
[39]
Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Tibshirani, R. and Friedman, J. (2009) Overview of Supervised Learning. In: Hastie, T., Tibshirani, R. and Friedman, J., Eds., The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, Berlin, 9-41.
[40]
Nti, I.K., Nyarko-Boateng, O., Aning, J., et al. (2021) Performance of Machine Learning Algorithms with Different K Values in K-Fold Cross-Validation. International Journal of Information Technology and Computer Science, 13, 61-71. https://doi.org/10.5815/ijitcs.2021.06.05
[41]
Chen, Y.G. (2013) New Approaches for Calculating Moran’s Index of Spatial Autocorrelation. PLOS ONE, 8, e68336. https://doi.org/10.1371/journal.pone.0068336
[42]
Nguyen, K.T., Nguyen, Q.D., Le, T.A., Shin, J. and Lee, K. (2020) Analyzing the Compressive Strength of Green Fly Ash Based Geopolymer Concrete Using Experiment and Machine Learning Approaches. Construction and Building Materials, 247, Article ID: 118581. https://doi.org/10.1016/j.conbuildmat.2020.118581
[43]
Kobayashi, K. and Us Salam, M. (2000) Comparing Simulated and Measured Values Using Mean Squared Deviation and Its Components. Agronomy Journal, 92, 345-352. https://doi.org/10.2134/agronj2000.922345x
[44]
Andreas, A., Mavromoustakis, C.X., Mastorakis, G. Mumtaz, S., Batalla, J.M. and Pallis, E. (2020) Modified Machine Learning Technique for Curve Fitting on Regression Models for COVID-19 Projections. 2020 IEEE 25th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), 14-16 September 2020, 1-6. https://doi.org/10.1109/CAMAD50429.2020.9209264
[45]
Zhang, B.Z., Duan, M., Sun, Y.F., Lyu, Y.T., Hou, Y.L. and Tan, T. (2023) Air Quality Index Prediction in Six Major Chinese Urban Agglomerations: A Comparative Study of Single Machine Learning Model, Ensemble Model, and Hybrid Model. Atmosphere, 14, Article No. 1478. https://doi.org/10.3390/atmos14101478
[46]
Barry, M.H., Nderu, L. and Gichuhi, A.W. (2023) A Hybrid Spatial Dependence Model Based on Radial Basis Function Neural Networks (RBFNN) and Random Forest (RF). Journal of Data Analysis and Information Processing, 11, 293-309. https://doi.org/10.4236/jdaip.2023.113015
[47]
Sun, Y.M., Ao, Z.Q., Jia, W.W., Xu, K., et al. (2021) A Geographically Weighted Deep Neural Network Model for Research on the Spatial Distribution of the Down Dead Wood Volume in Liangshui National Nature Reserve (China). IForest-Biogeosciences and Forestry, 14, 353-361. https://doi.org/10.3832/ifor3705-014