|
- 2018
半监督约束集成的快速密度峰值聚类算法DOI: 10.3785/j.issn.1008-973X.2018.11.018 Abstract: 为了解决2014年在Science上提出的快速密度峰值聚类(CFDP)算法存在的自动选择时误选和漏选中心点、簇的数量需要主观先验判断、算法使用受场景局限的缺陷,从半监督角度出发,结合集成学习思想提出半监督约束集成的快速密度峰值聚类(SiCE-CFDP)算法.SiCE-CFDP算法使用相对密度方式度量节点密度,从多角度分析决策图,自动选择候选中心点,并最终自动确定簇的数量.在只标注有限约束关系的前提下,算法能以集成学习指导约束信息的扩充,提升聚类性能.在方法验证中,通过3个人工数据集、4个公开数据集以及1个空调系统数据集进行仿真研究.结果表明,在相同的约束量前提下,针对大样本数据,SiCE-CFDP算法相比其他半监督聚类算法具有更高的聚类精度.Abstract: Aming at the weaknesses of clustering by fast search and find of density peaks (CFDP) proposed on Science in 2014 in selection of the cluster centers, subjective judgment of class number, limitation in some application scenarios, a semi-supervised constraint ensemble clustering by fast search and find of density peaks (SiCE-CFDP) was proposed. Relative density was used in SiCE-CFDP, the decision graph was analyzed from different perspectives to extract cluster centers, and the class number was decided by itself eventually. When facing finite constraint information, SiCE-CFDP enlarged constraint information by ensemble learning to improve clustering performance. Experiments were conducted on three synthetic datasets, four open datasets and one air conditioning system simulation dataset. For large-scale datasets, the clustering accuracy of SiCE-CFDP was higher than other well-known semi-supervised clustering algorithms.
|