|
中国科学院研究生院学报 2007
An Improved K-means Algorithm Based on Optimizing Initial Points
|
Abstract:
K-means is an important clustering algorithm. It is widely used in Internet information processing technologies. Because the procedure terminates at a local optimum, K-means is sensitive to initial starting condition. An improved algorithm is proposed, which searches for the relative density parts of the database and then generates initial points based on them. The method can achieve higher clustering accuracies by well excluding the effects of edge points and outliers, as well as adapt to databases which have very skewed density distributions.