Feature selection is very important to obtain meaningful and interpretive clustering results from a clustering analysis. In the application of soil data clustering, there is a lack of good understanding of the response of clustering performance to different features subsets. In the present paper, we analyzed the performance differences between k-means, fuzzy c-means, and spectral clustering algorithms in the conditions of different feature subsets of soil data sets. The experimental results demonstrated that the performances of spectral clustering algorithm were generally better than those of k-means and fuzzy c-means with different features subsets. The feature subsets containing environmental attributes helped to improve clustering performances better than those having spatial attributes and produced more accurate and meaningful clustering results. Our results demonstrated that combination of spectral clustering algorithm with the feature subsets containing environmental attributes rather than spatial attributes may be a better choice in applications of soil data clustering.
References
[1]
Jain, A.K., Murty, M.N. and Flynn, P.J. (1999) Data Clustering: A Reviewing. ACM Computing Surveys, 31, 264-323. https://doi.org/10.1145/331499.331504
[2]
Blum, A.L. and Langley, P. (1997) Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence, 97, 245-271. https://doi.org/10.1016/S0004-3702(97)00063-5
[3]
Guyon, I. and Elisseeff, A. (2003) An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3, 1157-1182. https://doi.org/10.1162/153244303322753616
[4]
Xu, R. and Donald, W. (2005) Survey of Clustering Algorithm. IEEE Transactions on Natural Networks, 16, 645-678. https://doi.org/10.1109/TNN.2005.845141
[5]
Young, F.J. and Hammer, R.D. (2000) Defining Geographic Soil Bodies by Landscape Position, Soil Taxonomy and Cluster Analysis. Soil Science Society of America Journal, 64, 948-998. https://doi.org/10.2136/sssaj2000.643989x
[6]
Araujo, S.R., Wetterlind, J., Dematte, J.A.M. and Stenberg, B. (2014) Improving the Prediction Performance of a Large Tropical vis-NIR Spectroscopic Soil Library from Brazil by Clustering into Smaller Subsets or Use of Data Mining Calibration Techniques. European Journal of Soil Science, 65, 718-729. https://doi.org/10.1111/ejss.12165
[7]
Triantafilis, J., Gibbs, I. and Earl, N. (2013) Digital Soil Pattern Recognition in the Lower Namoi Valley Using Numerical Clustering of Gamma-Ray Spectrometry Data. Geoderma, 192, 407-421. https://doi.org/10.1016/j.geoderma.2012.08.021
[8]
Davatgar, N., Neishabouri, M.R. and Sepashhah, A.R. (2012) Delineation of Site Specific Nutrient Management Zones for a Paddy Cultivated Area Based on Soil Fertility Using Fuzzy Clustering. Geoderma, 173-174, 111-118. https://doi.org/10.1016/j.geoderma.2011.12.005
[9]
Tripathi, R., Nayak, A.K., Shahid, M., Lai, B., Gautam, P., Raja, R., Mohanty, S., Kumar, A., Panda, B.B. and Sahoo, P.N. (2015) Delineation of Soil Management Zones for a Rice Cultivated Area in Eastern India Using Fuzzy Clustering. Catena, 133, 128-136. https://doi.org/10.1016/j.catena.2015.05.009
[10]
Odeh, I.O.A., McBratney, A.B. and Chittleborough, D.J. (1990) Design of Optimal Sample Spacing for Mapping Soil Using Fuzzy-k-Means and Regionalized Variable Theory. Geoderma, 47, 93-122. https://doi.org/10.1016/0016-7061(90)90049-F
[11]
Lin, Q.H., Li, H., Luo, W., Lin, Z.M. and Li, B.G. (2013) Optimal Soil Sampling Design for Rubber Tree Management Based on Fuzzy Clustering. Forest Ecology and Management, 308, 214-222. https://doi.org/10.1016/j.foreco.2013.07.028
[12]
Goberma, M., Navarro-Cano, Banuet, A.V., Garcia, C. and Verdu, M. (2014) Abiotic Stress Tolerance and Competition-Related Traits Underlie Phylogenetic Clustering in Soil Bacterial Communities. Ecology Letters, 17, 1191-1201. https://doi.org/10.1111/ele.12341
[13]
Deangelis, K.M. and Firestone, M.K. (2012) Phylogenetic Clustering of Soil Microbial Communities by 16S rRNA But Not 16S rRNA Genes. Applied and Environmental Microbiology, 78, 2456-2461. https://doi.org/10.1128/AEM.07547-11
[14]
Powers, Z.C., Owen, J.G., Reddy, B.V., Ternei, M.A. and Brady, S.F. (2014) Chemical-Biogeographic Survey of Secondary Metabolism in Soil. PNAS, 111, 3757-3762. https://doi.org/10.1073/pnas.1318021111
[15]
Wu, K.L. and Yang, M.S. (2002) Alternative c-Means Clustering Algorithms. Pattern Recognition, 35, 2267-2278. https://doi.org/10.1016/S0031-3203(01)00197-2
[16]
Luxburg, U.V. (2007) A Tutorial on Spectral Clustering. Statistics and Computing, 17, 395-416. https://doi.org/10.1007/s11222-007-9033-z
[17]
Li, J.Y., Zhou, J.G., Huang, W.J., Zhang, J.Z. and Yang, X.D. (2010) Grouping Objects in Multi-Band Images Using an Improved Eigenvector-Based Algorithm. Mathematical and Computer Modeling, 51, 1332-1338. https://doi.org/10.1016/j.mcm.2009.11.009
[18]
Shi, J. and Malik, J. (2000) Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888-905. https://doi.org/10.1109/34.868688
[19]
Zhu, A.X. (2000) Mapping Soil Landscape as Spatial Continua: The Neural Network Approach. Water Resources Research, 36, 663-677. https://doi.org/10.1029/1999WR900315
[20]
Liu, S.L., Li, Y., Wu, J.S., Huang, D.Y., Su, Y.R. and Wei, W.X. (2010) Spatial Variability of Soil Microbial Biomass Carbon, Nitrogen and Phosphorus in a Hilly Red Soil Landscape in Subtropical China. Soil Science and Plant Nutrition, 56, 693-704. https://doi.org/10.1111/j.1747-0765.2010.00510.x