Clustering is an unsupervised machine learning technique used to organize unlabeled data into groups based on similarity. This paper applies the K-means and Fuzzy C-means clustering algorithms to a vehicle crash dataset in order to explore various patterns in the data. K-means assigns data points to clusters based on the similarity between the data point and the cluster centroids, which results in partitioning the data into distinct clusters. On the other hand, fuzzy C-means clustering allows data points to belong to multiple clusters simultaneously with varying degrees of membership, providing a more diverse representation of the data. Results show that while K-means clustering is simpler and easier to interpret, fuzzy C-means clustering offers more flexibility and can manage situations where data points may have more cluster assignments.
Cite this paper
Abdulhafedh, A. (2025). Applying K-Means Clustering and Fuzzy C-Means Clustering in Vehicle Crashes. Open Access Library Journal, 12, e2856. doi: http://dx.doi.org/10.4236/oalib.1112856.
Imbens, G.W. and Rubin, D.B. (2015) Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press. https://doi.org/10.1017/cbo9781139025751
Waggoner, P.D. (2020) Unsupervised Machine Learning for Clustering in Political and Social Research. Cambridge University Press. https://doi.org/10.1017/9781108883955
Ever-Hadani, S. (1980) Applications of Cluster Analysis Algorithm to Geostatistical Series. Regional Science and Urban Economics, 10, 123-151. https://doi.org/10.1016/0166-0462(80)90052-6
Ghosh, S. and Dubey, S.K. (2013) Comparative Analysis of K-Means and Fuzzy C-Means Algorithms. International Journal of Advanced Computer Science and Applications, 4, 35-39. https://doi.org/10.14569/ijacsa.2013.040406
Hamerly, G. and Elkan, C. (2002) Alternatives to the K-Means Algorithm that Find Better Clus-terings. Proceedings of the Eleventh International Conference on Information and Knowledge Management, New York, 4-9 November 2002, 600-607. https://doi.org/10.1145/584792.584890
Bradley, P.S. and Fayyad, U.M. (1998) Refining Initial Points for K-Means Clustering. Proceedings of the 15th International Conference on Machine Learning, Madi-son, 24-27 July 1998, 91-99.
Kalton, A., Langley, P., Wagstaff, K. and Yoo, J. (2001) Generalized Clustering, Supervised Learning, and Data Assign-ment. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 26-29 August 2001, 299-304. https://doi.org/10.1145/502512.502555
Kearns, M., Mansour, Y. and Ng, A.Y. (1997) An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering. Proceedings of the 13th Confer-ence on Uncertainty in Artificial Intelligence (UAI1997), Providence, 1-3 Au-gust 1997, 282-293.
Pelleg, D. and Moore, A. (2000) X-Means: Extend-ing K-Means with Efficient Estima-tion of the Number of Clusters. Proceedings of the 17th International Conference on Machine Learning (ICML2000), Stan-ford, 29 June-2 July 2000, 727-734.
Cebeci, Z. and Yildiz, F. (2015) Comparison of K-Means and Fuzzy C-Means Algorithms on Different Cluster Structures. Journal of Agricultural Informatics, 6, 13-23. https://doi.org/10.17700/jai.2015.6.3.196
Cai, W., Chen, S. and Zhang, D. (2007) Fast and Robust Fuzzy C-Means Clustering Algorithms Incorporating Local Information for Image Segmentation. Pattern Recognition, 40, 825-838. https://doi.org/10.1016/j.patcog.2006.07.011
Kim, D., Lee, K.H. and Lee, D. (2004) On Cluster Validity Index for Estimation of the Optimal Number of Fuzzy Clusters. Pattern Recognition, 37, 2009-2025. https://doi.org/10.1016/j.patcog.2004.04.007
Liu, Y., Li, Z., Xiong, H., Gao, X. and Wu, J. (2010) Understanding of Internal Clustering Validation Measures. 2010 IEEE International Conference on Data Mining, Sydney, 13-17 December 2010, 911-916. https://doi.org/10.1109/icdm.2010.35
Xie, X.L. and Beni, G. (1991) A Validity Measure for Fuzzy Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 841-847. https://doi.org/10.1109/34.85677
Kaufman, L. and Rousseeuw, P.J. (1990) Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Inc. https://doi.org/10.1002/9780470316801