全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

近邻传播的文本聚类集成谱算法

DOI: 10.3969/j.issn.1006-7043.201109001

Keywords: 近邻传播, 聚类集成, 文本聚类, 谱聚类, 矩阵变换

Full-Text   Cite this paper   Add to My Lib

Abstract:

针对现有聚类集成谱算法聚类结果不稳定的问题,引入近邻传播聚类思想,设计了基于近邻传播的聚类集成谱算法(APCESA).该算法先由聚类集成和谱分得到空间结构相对简单的文本低维嵌入,然后通过近邻传播算法得到最终的聚类结果.在谱分解过程中,采用矩阵变换方法,避免了谱算法中特征值分解的高昂计算代价.对真实文本数据集的实验结果表明,所提算法比对比算法聚类更稳定,且聚类结果的NMI值和ANMI值均高于对比算法.

References

[1]  TAN P N, STEINBACH M, KUMAR V. Introduction to data mining [M]. Toronto: Addison-Wesley Longman, 2005:20-23. ?
[2]  STREHL A, GHOSH J. Cluster ensemblesdash―a knowledge reuse frame-work for combining partitionings [J]. Journal of Machine Learning Research, 2002, 3: 583-617.?
[3]  TOPCHY A, JAIN A K, PUNCH W. A mixture model for clustering ensembles[C]// Proceedings of the 4th SIAM International Conference on Data Mining. Florida, 2004: 379-390. ?
[4]  DING Shifei, ZHANG Liwen, YU Zhang. Research on spectral clustering algorithms and prospects[C]//2010 2nd International Conference on Computer Engineering and Technology. Chengdu,2010:149-153.?
[5]  徐森, 卢志茂, 顾国昌. 解决文本聚类集成问题的两个谱算法[J].自动化学报, 2009, 35(7): 997-1002. ?XU Sen, LU Zhimao, GU Guochang. Two spectral algorithms for ensembling document clusters [J]. Acta Automatica Sinica, 2009, 35(1): 997-1002.?
[6]  FREY B J, DUECK D. Clustering by passing messages between data points [J]. Science, 2007, 315(5814): 972-976.?
[7]  KARYPIS G, AGGARWAL R, KUMAR V, et al. Multilevel hypergraph partitioning: applications in VLSI domain [J]. IEEE Transactions on Very Large Scale Integration, 1999, 7(1): 69-79.?
[8]  FERN X Z, BRODLEY C E. Solving cluster ensemble problems by bipartite graph partitioning [C]// Proceedings of the 21st International Conference on Machine Learning. New York: ACM, 2004: 36.?
[9]  TIAN Z, LI X B, JU Y W. Spectral clustering based on matrix perturbation theory [J]. Science in China Series F: Information Sciences, 2007, 50(1): 63-81.?
[10]  National Institute of Stangdards and Technology. Text Retrieval Conference [EB/OL].[2010-11-20]. http://trec.nist.gov.?
[11]  LEWIS D D. Reuters-21578 text categorization test collection distribution 1.0[EB/OL]. [2010-11-20]. http://www. research.att.com /~lewis.?
[12]  肖宇, 于剑. 基于近邻传播算法的半监督聚类[J].软件学报, 2008, 19(11): 2803-2813. ?XIAO Yu, YU Jian. Semi-supervised clustering based on affinity propagation [J]. Journal of Software, 2008, 19(11): 2803-2813. ?
[13]  王开军, 张军英, 李丹,等. 自适应仿射传播聚类[J].自动化学报,2007, 33(12): 1242-1246. ?WANG Kaijun, ZHANG Junying, LI Dan, et al. Adaptive affinity propagation clustering [J]. Acta Automatica Sinica, 2007, 33(12): 1242-1246.?
[14]  ZHANG Xiangliang, WANG Wei, Kjetil N, et al. K-AP: generation specified k cluster by efficient affinity propagation[C]// 2010 IEEE International Conference on Data Mining. Sydney, Australia, 2010, 107: 1187-1192.?
[15]  董俊, 王锁萍, 熊范纶.可变相似性度量的近邻传播聚类[J]. 电子与信息学报, 2010, 32(3): 509-514. ?DONG Jun, WANG Suoping, XIONG Fanlun. Affinity propagation clustering based on variable-similarity measure[J]. Journal of Electronics & Information Technology, 2010, 32(3): 509-514.?
[16]  FRED A L, JAIN A K. Combining multiple clusterings using evidence accumulation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835-850.?
[17]  AYAD H, BASIR O A, KAMEL M. A probabilistic model using information theoretic measures for cluster ensembles [C]// Proceedings of the 5th International Workshop on Multiple Classifier Systems. Cagliari, Italy, 2004: 144-153.?
[18]  FERN X Z, LIN W. Cluster ensemble selection [J]. Statistical Analysis and Data Mining, 2008, 1(3): 128-141.?
[19]  KARYPIS G, KUMAR V. A fast and high quality multilevel scheme for partitioning irregular graphs [J]. SIAM Journal on Scientific Computing, 1998, 20(1): 359-392.?

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133