全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2016 

一种基于密度峰值发现的文本聚类算法
A text clustering algorithm based on find of density peaks

DOI: 10.6040/j.issn.1671-9352.1.2015.042

Keywords: 密度,文本聚类,特征项,向量距离,
ducument clustering
,vector distance,density,feature term

Full-Text   Cite this paper   Add to My Lib

Abstract:

摘要: 提出一种基于密度峰值发现的文本聚类算法,将文本的距离与密度计算转化为文本向量的相似度计算,实现基于密度峰值发现的文本聚类算法。该算法采用空间向量模型表示文本,用余弦公式进行相似度计算,然后求得每个文本的密度和距离。剔除噪音点后,选取聚类中心,将剩下的非中心点划分到距离其最近的聚类中心所在的类簇中去。通过多组对比试验,验证了本方法的可靠性和鲁棒性。
Abstract: A text clustering algorithm based on find of density peak was proposedin this paper. The algorithm was implemented by the calculation of text distance and density,which was in accordance with calculation of the text vector similarity. VSM(Vector Space Model)was used to express ducument to obtain the similarity calculation with cosine formula. The cucument work was to find the local density and the distance from points of higher density of each ducument, remove the noise points and select the cluster center. The remainednon-centralpoints were assigned into the cluster which was the nearest to the cluster center. According to several sets of contrast experiments, the density-based text clustering was improved to have an advantage of reliability and robustness

References

[1]  索红光,王玉伟. 一种用于文本聚类的改进<i>k</i>-means算法[J]. 山东大学学报(理学版), 2008,43(1):60-64. SUO Hongguang, WANG Yuwei. An improved <i>k</i>-means algorithm for document clustering[J].Journal of Shandong University(Natural Science), 2008, 43(1):60-64.
[2]  何晏成. 基于近邻传播和凝聚层次的文本聚类方法[D]. 哈尔滨:哈尔滨工业大学, 2010. HE Yancheng. A document clustering method based on affinity propagation and agglomerative hierarchical clustering[D]. Harbin: Harbin Institute of Technology, 2010.
[3]  TRAN T N, DRAB K, DASZYKOWSKI M. Revised DBSCAN algorithm to cluster data with dense adjacent clusters[J]. Chemometrics and Intelligent Laboratory Systems, 2013, 120(2):92-96.
[4]  JIANG H, LI J, YI S, et al. A new hybrid method based on partitioning-based DBSCAN and ant clustering[J]. Expert Systems with Applications, 2011, 38(8):9373-9381.
[5]  赵卫中,马慧芳,李志清,等. 一种结合主动学习的半监督文档聚类算法[J]. 软件学报,2012,23(6):1486-1499. ZHAO Weizhong, MA Huifang, LI Zhiqing, et al. Efficiently active learning for semisupervised document clustering[J]. Journal of Software, 2012, 23(6):1486-1499.
[6]  CHEN X, LIU W, QIU H, et al. APSCAN:a parameter free algorithm for clustering[J]. Pattern Recognition Letters, 2011, 32(7):973-986.
[7]  殷风景,肖卫东,葛斌,等. 一种面向网络话题发现的增量文本聚类算法[J]. 计算机应用研究,2011,28(1):54-57. YIN Fengjing, XIAO Weidong, GE Bin, et al. Incremental algorithm for clustering texts in internet-oriented topic detection[J]. Application Research of Computers, 2011, 28(1):54-57.
[8]  张云,冯博琴,麻首强,等. 蚁群-遗传融合的文本聚类算法[J]. 西安交通大学学报,2007,41(10):1146-1150. ZHANG Yun, FENG Boqin, MA Shouqiang, et al. Text clustering based on fusion of ant colony and genetic algorithms[J]. Journal of Xian Jiaotong University, 2007, 41(10):1146-1150.
[9]  梁君玲,肖人岳,王向东. 一种改进的自适应蚁群聚类算法[J]. 计算机应用研究,2011,28(4):1263-1265. LIANG Junling, XIAO Renyue, WANG Xiangdong. Improved adaptive ant swam clusteringalgorithm[J].Application Research of Computers, 2011, 28(4):1263-1265.
[10]  蔡岳,袁津生. 基于改进DBSCAN算法的文本聚类[J]. 计算机工程,2011,37(12): 50-52. CAI Yue, YUAN Jinsheng. Text clustering based on improved DBSCAN algorithm[J]. Computer Engineering, 2011, 37(12):50-52.
[11]  雷小锋,谢昆青,林帆,等. 一种基于<i>K</i>-Means局部最优性的高效聚类算法[J]. 软件学报,2008,19(7):1683-1692. LEI Xiaofeng, XIE Kunqing, LIN Fan, et al. An efficient clustering algorithm based on local optimality of <i>K</i>-Means[J]. Journal of Software, 2008, 19(7):1683-1692.
[12]  DHILLON I S, MODHA D S. Concept decompositions for large sparse text data using clustering[J]. Machine learning, 2001, 42(1-2):143-175.
[13]  RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[14]  刘露, 彭涛, 左万利,等. 一种基于聚类的PU主动文本分类方法[J]. 软件学报, 2013, 22(11):2571-2583. LIU Lu, PENG Tao, ZUO Wanli, et al. Clustering-based PU active text classification method[J]. Journal of Software, 2013, 22(11):2571-2583.
[15]  MURTAGH F, CONTRERAS P. Algorithms for hierarchical clustering:an overview[J]. Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery, 2012, 2(1):86-97.
[16]  SZABO A, PRIOR A K F, DE CASTRO L N. The behavior of particles in the Particle Swarm Clustering algorithm[C] //Proceedings of Fuzzy Systems(FUZZ),2010 IEEE International Conference on. Barcelona, Spain: IEEE, 2010:1-7.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133