全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于矩阵谱分析的文本聚类集成算法*

, PP. 780-786

Keywords: 聚类分析,聚类集成,谱聚类,文本聚类,矩阵低秩近似

Full-Text   Cite this paper   Add to My Lib

Abstract:

聚类集成技术可有效提高单聚类算法的精度和稳定性,其中的关键问题是如何根据不同的聚类成员组合为更好的聚类结果.文中引入谱聚类算法解决文本聚类集成问题,设计基于正则化拉普拉斯矩阵的谱算法(NLMSA).该算法基于代数变换,通过求解小规模矩阵的特征值和特征向量间接获得正则化拉普拉斯矩阵的特征向量,并用于后续聚类.进一步研究谱聚类算法的关键思想,设计基于超边转移概率矩阵的谱算法(HTMSA).该算法通过求解超边的低维嵌入间接获得文本的低维嵌入,并用于后续K均值算法.在TREC和Reuters文本集上的实验结果验证NLMSA和HTMSA的有效性,它们都获得比其它基于图划分的集成算法更为优越的结果.HTMSA获得的结果比NLMSA略差,而时间和空间需求则比NLMSA低得多.

References

[1]  Tan P N, Steinbach M, Kumar V. Introduction to Data Mining. Toronto, Canada: Addison-Wesley Longman, 2005
[2]  Strehl A, Ghosh J. Cluster Ensembles—A Knowledge Reuse Framework for Combining Partitionings // Proc of the 11th Conference on Artificial Intelligence. Edmonton, Canada, 2002: 93-98
[3]  Fred A L, Jain A K. Combining Multiple Clusterings Using Evidence Accumulation. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835-850
[4]  Fern X Z, Brodley C E. Solving Cluster Ensemble Problems by Bipartite Graph Partitioning // Proc of the 20th International Conference on Machine Learning. Banff, Canada, 2004: 36-43
[5]  Topchy A, Jain A K, Punch W. A Mixture Model for Clustering Ensembles // Proc of the 4th SIAM International Conference on Data Mining. Lake Buena Vista, USA, 2004: 379-390
[6]  Ayad H, Basir O A, Kamel M. A Probabilistic Model Using Information Theoretic Measures for Cluster Ensembles // Proc of the 5th International Workshop on Multiple Classifier Systems. Cagliari, Italy, 2004: 144-153
[7]  Tang Wei, Zhou Zhihua. Bagging-Based Selective Cluster Ensemble. Journal of Software, 2005, 16(4): 496-502 (in Chinese) (唐 伟,周志华.基于Bagging的选择性聚类集成.软件学报, 2005, 16(4): 496-502)
[8]  Fern X Z, Lin W. Cluster Ensemble Selection. Statistical Analysis and Data Mining. 2008, 1(3): 128-141
[9]  Luo Huilan, Kong Fansheng, Li Yixiao. An Analysis of Diversity Measures in Clustering Ensembles. Chinese Journal of Computers, 2007, 30(8): 1315-1324 (in Chinese) (罗会兰,孔繁胜,李一啸.聚类集成中的差异性度量研究.计算机学报, 2007, 30(8): 1315-1324)
[10]  Luxburg U V. A Tutorial on Spectral Clustering. Statistics and Computing, 2007, 17(4): 395-416
[11]  Shi Jianbo, Malik J. Normalized Cuts and Image Segmentation. IEEE Trans on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905
[12]  Ng A Y, Jordan M I, Weiss Y. On Spectral Clustering: Analysis and an Algorithm [EB/OL]. [2008-04-10]. http://books.nips.cc/papers/files/nips14/AA35.pdf
[13]  Meila M, Shi J. A Random Walks View of Spectral Segmentation // Proc of the 8th International Workshop on Artificial Intelligence and Statistics. Key West, USA, 2001: 31-37
[14]  Karypis G, Kumar V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing, 1998, 20(1): 359-392
[15]  Karypis G, Aggarwal R, Kumar V, et al. Multilevel Hypergraph Partitioning: Applications in VLSI Domain // Proc of the 34th Annual Conference on Design Automation. Anaheim, USA, 1997: 526-529
[16]  Dhillon I S. Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning // Proc of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2001: 269-274
[17]  von Luxburg U, Belkin M, Bousquet O. Consistency of Spectral Clustering. Annals of Statistics, 2008, 36(2): 555-586
[18]  Berry M W. Large-Scale Sparse Singular Value Computations. The International Journal of Supercomputer Applications, 1992, 6(1): 13-49
[19]  National Institute of Standards and Technology. Text Retrieval Conference [DB/OL]. [2007-11-20]. http://trec.nist.gov
[20]  Lewis D D. Reuters-21578~1.0 [DB/OL]. [2008-07-10]. http://www.daviddlewis.com/resources/testcouecthons/reaters21578

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133