%0 Journal Article
%T Tri-Training and Data Editing Based Semi-Supervised Clustering Algorithm
基于Tri-Training和数据剪辑的半监督聚类算法
%A DENG Chao
%A GUO Mao-Zu
%A
邓超
%A 郭茂祖
%J 软件学报
%D 2008
%I
%X In this paper, a algorithm named DE-Tri-training semi-supervised K-means is proposed, which could get a seeds set of larger scale and less noise. In detail, prior to using the seeds set to initialize cluster centroids, the training process of a semi-supervised classification approach named Tri-training is used to label unlabeled data and add them into the initial seeds set to enlarge the scale. Meanwhile, to improve the quality of the enlarged seeds set, a nearest neighbor rule based data editing technique named Depuration is introduced into Tri-training process to eliminate and correct the mislabeled noise data in the enlarged seeds. Experimental results show that the novel semi-supervised clustering algorithm could effectively improve the cluster centroids initialization and enhance clustering performance.
%K semi-supervised clustering
%K semi-supervised classification
%K K-means
%K seeds set
%K Tri-training
%K depuration data editing
半监督聚类
%K 半监督分类
%K K-均值
%K seeds集
%K Tri-Training
%K Depuration数据剪辑
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=B9547C6A60D120A0941E4C8B2EB71AFD&yid=67289AFF6305E306&vid=2A8D03AD8076A2E3&iid=38B194292C032A66&sid=46FF101E7ECF9F15&eid=B28C697BC3A1BA62&journal_id=1000-9825&journal_name=软件学报&referenced_num=0&reference_num=23