%0 Journal Article
%T K-means algorithm based on data field
一种基于数据场的K-均值算法
%A 简艳
%A 贾洪勇
%J 计算机应用研究
%D 2010
%I
%X K-means algorithm has several limitations: choosing initial class centre of divisions was random, too sensitive to noises and outliers, divisions had a great difference in shape was not applicable. To against the deficiency, drawing on the experience of molecular interaction model with the text simulated as data point in the data field and considering the overall similarity and difference of texts, this paper proposed a new formula to compute the data potential. The formula could get rid of the outliers and determine the initial class centre according to the potential of document data. Experiments show that improved K-means algorithm can get higher convergence rate, eliminate the bad impact of noise and outliers on the clustering results and improve the precision of the clustering. So, the improved K-means algorithm is well suited to the non-uniform subject distributions.
%K K-means
%K interaction force among molecules
%K data field
%K text clustering
K-均值
%K 分子间相互作用力
%K 数据场
%K 文本聚类
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=A9D9BE08CDC44144BE8B5685705D3AED&aid=9C4EF06237A8D71FF4067FC3B09F207D&yid=140ECF96957D60B2&vid=DB817633AA4F79B9&iid=59906B3B2830C2C5&sid=4AD0623CFDD74154&eid=A54D728F61BBA606&journal_id=1001-3695&journal_name=计算机应用研究&referenced_num=0&reference_num=10