All Title Author
Keywords Abstract


An Experiment of K-Means Initialization Strategies on Handwritten Digits Dataset

DOI: 10.4236/iim.2018.102003, PP. 43-48

Keywords: K-means, Clustering Performance Evaluation, Machine Learning, Principal Component Analysis

Full-Text   Cite this paper   Add to My Lib

Abstract:

Clustering is an important unsupervised classification method which divides data into different groups based some similarity metrics. K-means becomes an increasing method for clustering and is widely used in different application. Centroid initialization strategy is the key step in K-means clustering. In general, K-means has three efficient initialization strategies to improve its performance i.e., Random, K-means++ and PCA-based K-means. In this paper, we design an experiment to evaluate these three strategies on UCI ML hand-written digits dataset. The experiment result shows that the three K-means initialization strategies find out almost identical cluster centroids, and they have almost the same results of clustering, but the PCA-based K-means strategy significantly improves running time, and is faster than the other two strategies.

References

[1]  Pfitzner, D., Leibbrandt, R. and Powers, D. (2009) Characterization and Evaluation of Similarity Measures for Pairs of Clusterings. Knowledge and Information Systems, 19, 361-394.
https://doi.org/10.1007/s10115-008-0150-6
[2]  Kodinariya, T.M. (2014) Survey on Exiting Method for Selecting Initial Centroids in K-Means Clustering. International Journal of Engineering Development and Research, 2, 2865-2868.
[3]  Hamerly, G. and Elkan, C. (2002) Alternatives to the K-Means Algorithm that Find Better Clusterings. Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM), McLean, VA, 4-9 November 2002, 600-607.
https://doi.org/10.1145/584792.584890
[4]  Celebi, M.E., Kingravi, H.A. and Vela, P.A. (2013) A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm. Expert Systems with Applications, 40, 200-210, arXiv:1209.1960.
https://doi.org/10.1016/j.eswa.2012.07.021
[5]  Bradley, P.S. and Fayyad, U.M. (1998) Refining Initial Points for K-Means Clustering. Proceedings of the Fifteenth International Conference on Machine Learning, Madison, WI, 24-27 July 1998, 91-99.
[6]  Vattani, A. (2011) K-Means Requires Exponentially Many Iterations Even in the Plane. Discrete and Computational Geometry, 45, 596-616.
https://doi.org/10.1007/s00454-011-9340-1
[7]  Arthur, D. and Vassilvitskii, S. (2007) K-Means++: The Advantages of Careful Seeding. Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, 1027-1035.
[8]  Ding, C. and He, X.F. (2004) K-Means Clustering via Principal Component Analysis. Proceedings of the Twenty-First International Conference on Machine Learning, Banff, Alberta, 4-8 July 2004, 29.
https://doi.org/10.1145/1015330.1015408
[9]  UCI Machine Learning Repository: Hand-Written Digits Datasets.
http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

微信:OALib Journal