|
计算机应用研究 2012
K-means clustering algorithm based on optimal initial centers related to pattern distribution of samples in space
|
Abstract:
To overcome the sensible of traditional K-means clustering algorithm to initial centers, and avoid the arbitrary of available improved K-means algorithms for discovering good initial centers, this paper proposed a new algorithm to find the optimal initial centers for K-means clustering algorithm. It defined the density and the neighborhood for each sample according to the natural pattern distribution of exemplars in data space, so that the samples chose as initial seeds not only lie in the higher density area, but also far away from each other. It tested the new algorithm on some well-known datasets from UCI machine learning repository and on some synthetic datasets with different proportion noises using many different measures. The experimental results demonstrate that our new algorithm achieves excellent clustering result in short run time and is insensible to noisy data. It outperforms the traditional K-means clustering algorithm and those available algorithms for improving the initial seeds of K-means clustering algorithm.