全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于群体智能的半结构化藏文文本聚类算法*

, PP. 663-672

Keywords: 群体智能,藏文,聚类分析,群体相似度

Full-Text   Cite this paper   Add to My Lib

Abstract:

将群体智能技术应用于半结构化的藏文Web文本聚类,提出基于群体智能的半结构化藏文Web文本聚类算法(SCAST).充分考虑群体智能技术对藏文文本聚类准确性和时间效率的影响,SCAST算法首先运用向量空间模型表示藏文文本信息,将藏文文本和智能蚁群随机放置于一个文本向量空间中.然后智能蚂蚁随机选择藏文文本,计算藏文文本在当前局部区域内的相似性,获得拾起或者放下文本的概率,进而决定是否“拾起”,“移动”,“放下”藏文文本.最后通过多次迭代训练,将藏文文本按其相似性聚集在一起,得到最终聚类结果.大量真实藏文Web文本数据上的实验结果表明,相较于传统的k-means聚类算法,基于群体智能的藏文文本聚类算法在聚类准确率上平均提高约8.0%.

References

[1]  Liu Y C, Wang X L, Xu Z M, et al. A Survey of Document Clustering. Journal of Chinese Information Processing, 2006, 20(3): 55-62 (in Chinese) (刘远超,王晓龙,徐志明,等.文档聚类综述.中文信息学报, 2006, 20(3): 55-62)
[2]  Dorigo M, Caro G D, Gambardella L M. Ant Algorithms for Discrete Optimization. Artificial Life, 1999, 5(2): 137-172
[3]  Lumer E D, Faieta B. Diversity and Adaptation in Populations of Clustering Ants // Cliff D, Husbands P, Meyer J A, eds. Proceedings of the 3rd International Conference on Simulation of Adaptive Behavior: From Animals to Animats 3. Cambridge, Britain: MIT Press, 1994: 501-508
[4]  Guan B. Research on the Segmentation Unit of Tibetan Word for Information Processing. Journal of Chinese Information Processing, 2010, 24(3): 124-128 (in Chinese) (关 白.信息处理用藏文分词单位研究.中文信息学报, 2010, 24(3): 124-128)
[5]  CaiRang Z M, Cai Z J. Development and Research of Tibetan Text Automatic Proofreading System. Journal of Northwest University for Nationalities: Natural Science, 2009, 30(1): 25-28 (in Chinese) (才让卓玛,才智杰.藏文文本自动校对系统开发研究.西北民族大学学报:自然科学版, 2009, 30(1): 25-28)
[6]  Cai Z J. Identification of Abbreviated Word in Tibetan Word Segmentation. Journal of Chinese Information Processing, 2009, 23(1): 35-37,43 (in Chinese) (才智杰.藏文自动分词系统中紧缩词的识别.中文信息学报, 2009, 23(1): 35-37,43)
[7]  Wu X D. Positive Maximum Matching Segmentation Algorithm Analysis and Improvement. Public Communication of Science & Technology, 2011, 10(20): 164-165 (in Chinese) (吴旭东.正向最大匹配分词算法的分析与改进.科技传播, 2011, 10(20): 164-165)
[8]  Yong C. Research on Lucene-Based Tibetan Full-Text Retrieval. Journal of Tibet University: Natural Science Edition, 2009, 24(1):58-60 (in Chinese) (拥 措.基于LUCENE的藏文全文检索的研究.西藏大学学报:自然科学版, 2009, 24(1): 58-60)
[9]  Huang C H, Yin J, Hou F. A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method. Chinese Journal of Computers, 2011, 34(5): 856-864 (in Chinese) (黄承慧,印 鉴,侯 昉.一种结合词项语义信息和TF-IDF方法的文本相似度量方法.计算机学报, 2011, 34(5): 856-864)
[10]  Zhong J, Liu L H, Liang C W. Active Semi-Supervised Text Clustering Based on Pairwise Constraints. Computer Engineering, 2011, 37(13): 193-186 (in Chinese)(钟 将,刘龙海,梁传伟.基于成对约束的主动半监督文本聚类.计算机工程, 2011, 37(13): 183-186)
[11]  Liu X Y. Text Clustering Algorithm with Ant Colony Based on the Best Solution Kept. Computer Engineering & Science, 2010, 32(5): 79-81 (in Chinese)(刘晓勇.基于最优适值保留的蚁群文本聚类算法.计算机工程与科学, 2010, 32(5): 79-81)
[12]  Ma S X, Liu D, Jia S J. Text Clustering Algorithm Based on Ant Colony Algorithm. Computer Engineering, 2010, 36(8): 206-207,210 (in Chinese)(马世霞,刘 丹,贾世杰.基于蚁群算法的文本聚类算法.计算机工程, 2010, 36(8): 206-207,210)
[13]  Wu B, Fu W P, Zheng Y, et al. A Clustering Algorithm Based on Swarm Intelligence for Web Document. Journal of Computer Research and Development, 2002, 39(11): 1429-1435 (in Chinese)(吴 斌,傅伟鹏,郑 毅,等.一种基于群体智能的Web文档聚类算法.计算机研究与发展, 2002, 39(11): 1429-1435)
[14]  Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing. Communications of the ACM, 1975, 18(11): 613-620
[15]  Sebastiani F. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 2002, 34(1): 1-47

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133