|
- 2018
基于核K-means的增量多视图聚类算法
|
Abstract:
摘要: 针对基于核的多视图聚类算法(kernel based multi-view clustering method, MVKKM)在处理大规模数据集时运行时间长的缺点,引入增量聚类模型的概念,将MVKKM算法与增量聚类模型相结合,提出基于核K-means的多视图增量聚类算法(incremental multi-view clustering algorithm based on kernel K-means, IMVCKM)。通过将数据集分块,在每个数据块中使用MVKKM算法聚类,并将每个数据块的聚类中心作为下个数据块的初始聚类中心。将所有块的聚类中心进行整合后再次进行多视图聚类,得到最终的聚类结果。试验结果表明,在3个大规模数据集上,IMVCKM算法相较于MVKKM算法在3个评价指标上具有更好的聚类结果,且运行时间更短。该算法在保证聚类性能的基础上大大降低算法的运行时间。
Abstract: Because of the defect of long running time in the kernel based multi-view clustering algorithm(MVKKM)when dealing with large-scale datasets, the concept of incremental clustering model was introduced. The incremental multi-view clustering algorithm based on kernel K-means(IMVKKM)was proposed by combining MVKKM algorithm and incremental clustering framework. The dataset was divided into chunks and the MVKKM method was used in each data chunk to obtain a set of cluster centers,which was regarded as the initial cluster center of the next chunk. The cluster centers of all the chunks were combined and the final set of cluster result was identified by using MVKKM. The experimental results showed that IMVKKM algorithm had better clustering results and shorter running time than MVKKM algorithm on three large-scale datasets. The proposed approach could reduce the running time while keeping the clustering performance
[1] | BICKEL S, SCHEFFER T. Multi-view clustering[C] //Proceedings of the 4th IEEE International Conference on Data Mining. New Jersey, USA: IEEE, 2004, 4: 19-26. |
[2] | KUMAR A, DAUME H. A co-training approach for multi-view spectral clustering[C] //Proceedings of the 28th International Conference on Machine Learning. Washington, USA: ICML, 2011: 393-400. |
[3] | ZHAO Yang, DOU Yong, LIU Xinwang, et al. A novel multi-view clustering method via low-rank and matrix-induced regularization[J]. Neurocomputing, 2016, 216: 342-350. |
[4] | HORE P, HALL L O, Goldgof D B, et al. Online fuzzy C means[C] //Fuzzy Information Processing Society. New Jersey, USA: IEEE, 2008: 1-5. |
[5] | HORE P, HALL L O, GOLDGOF D B. Single pass fuzzy C means[C] //IEEE International Conference on Fuzzy Systems. London, UK: IEEE, 2007: 1-7. |
[6] | 李滔, 王士同. 适合大规模数据集的增量式模糊聚类算法[J]. 智能系统学报, 2016, 11(2):188-199. LI Tao, WANG Shitong. Incremental fuzzy(c+p)—means clustering for large data[J]. CAAI Transactions on Intelligent Systems, 2016, 11(2): 188-199. |
[7] | 张佩瑞. 基于多核学习的多视图增量聚类模型研究[D]. 成都: 西南交通大学, 2017. ZHANG Peirui. Research on multi-view incremental clustering based on multiple kernel learning[D]. Chengdu: Southwest Jiaotong University, 2017. |
[8] | 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012. |
[9] | 袁瑛. 基于正则化的多核学习方法及应用[D]. 广州: 华南理工大学, 2016. YUAN Ying. Multiple kernel learning with regularization and its application[D]. Guangzhou: South China University of Technology, 2016. |
[10] | 杨燕, 靳蕃, KAMEL M. 聚类有效性评价综述[J]. 计算机应用研究, 2008(6):1630-1632, 1638. YANG Yan, JIN Fan, KAMEL M. Survey of clustering validity evaluation[J]. Application Research of Computers, 2008(6): 1630-1632, 1638. |
[11] | NG R T, HAN J. CLARANS: A method for clustering objects for spatial data mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(5): 1003-1016. |
[12] | TZOTZIS G, LIKAS A. Kernel-based weighted multi-view clustering[C] //Proceedings of the 12th IEEE International Conference on Data Mining. New Jersey, USA: IEEE, 2012: 675-684. |
[13] | LIU Jialu, WANG Chi, GAO Jiawei, et al. Multi-view clustering via joint nonnegative matrix factorization[C] //Proceedings of the 2013 SIAM International Conference on Data Mining. Texas, USA: ResearchGate, 2013: 252-260. |
[14] | WANG Dong, YIN Qiyue, HE Ran, et al. Multi-view clustering via structured low-rank representation[C] //Proceedings of the 24th ACM International Conference on Information and Knowledge Management. Melbourne, Australia: ACM, 2015: 1911-1914. |
[15] | CAN F, DROCHAK N D I. Incremental clustering for dynamic document databases[C] //IEEE International Symposium on Applied Computing. New Jersey, USA: IEEE, 1990: 61-67. |
[16] | TZORTZIS G F, LIKAS A C. The global kernel <i>K</i>-means algorithm for clustering in feature space[J]. IEEE Transactions on Neural Networks, 2009, 20(7): 1181-1194. |
[17] | 邓强. 多视图子空间聚类集成方法研究及分布式实现[D]. 成都: 西南交通大学, 2016. DENG Qiang. Research on multi-view subspace clustering ensemble and its distributed implementation[D]. Chengdu: Southwest Jiaotong University, 2016. |
[18] | 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26. LIU Xiaoyong. A semi-supervised method based on tree kernel for relationship extraction[J]. Journal of Shandong University(Engineering Science), 2015, 45(2):22-26. |
[19] | 邓强, 杨燕, 王浩. 一种改进的多视图聚类集成算法[J]. 计算机科学, 2017, 44(1): 65-70. DENG Qiang, YANG Yan, WANG Hao. Improved multi-view clustering ensemble algorithm[J]. Computer Science, 2017, 44(1): 65-70. |
[20] | GU Quanquan, ZHOU Jie. Learning the shared subspace for multi-task clustering and transductive transfer classification[C] //Proceedings of the 9th IEEE International Conference on Data Mining. Florida, USA: IEEE, 2009: 159-168. |
[21] | DENG Zhaohong, CHOI K S, CHUNG Fulai, et al. Enhanced soft subspace clustering integrating within-cluster and between-cluster information[J]. Pattern Recognition, 2010, 43(3): 767-781. |