%0 Journal Article
%T Novel semi-supervised clustering algorithm based on active data selection
基于主动数据选取的半监督聚类算法
%A WEN Ping
%A LENG Ming-wei
%A CHEN Xiao-yun
%A
文 平
%A 冷明伟
%A 陈晓云
%J 计算机应用研究
%D 2012
%I
%X Semi-supervised clustering, which aims to significantly improve the clustering results using limited supervision, has inevitably been the research focus in data mining and machine learning in recent years. But the accuracy of existing semi-clustering algorithms is low when dealing with the datasets with little labeled data or the multi-density and unbalanced datasets. Based on the active learning, this paper studied the data selection and presented a novel semi-supervised clustering algorithm. It selected information-rich data as labeled data by combining the ideas of minimum spanning tree clustering and active lear-ning, and then used the KNN-like technology to propagate labels. Evaluating on several UCI standard datasets and synthetic datasets, the results show that the proposed method has manifest higher accuracy and stable performance in comparison with others, even when the datasets are multi-density and unbalanced.
%K data mining
%K semi-supervised clustering
%K active learning
%K labeled data
%K data selection
%K minimum spanning tree
%K multi-density dataset
%K unbalanced dataset
数据挖掘
%K 半监督聚类
%K 主动学习
%K 标签数据
%K 数据选取
%K 最小生成树
%K 多密度数据集
%K 不平衡数据集
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=A9D9BE08CDC44144BE8B5685705D3AED&aid=C2AB4F886AE798C9E9CFFB8F4A4841AC&yid=99E9153A83D4CB11&vid=771469D9D58C34FF&iid=5D311CA918CA9A03&sid=7910A8FC839BA1C2&eid=50702E31696D48BF&journal_id=1001-3695&journal_name=计算机应用研究&referenced_num=0&reference_num=16