|
计算机应用研究 2010
K-means algorithm based on data field
|
Abstract:
K-means algorithm has several limitations: choosing initial class centre of divisions was random, too sensitive to noises and outliers, divisions had a great difference in shape was not applicable. To against the deficiency, drawing on the experience of molecular interaction model with the text simulated as data point in the data field and considering the overall similarity and difference of texts, this paper proposed a new formula to compute the data potential. The formula could get rid of the outliers and determine the initial class centre according to the potential of document data. Experiments show that improved K-means algorithm can get higher convergence rate, eliminate the bad impact of noise and outliers on the clustering results and improve the precision of the clustering. So, the improved K-means algorithm is well suited to the non-uniform subject distributions.