OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

计算机应用研究 2012

Chinese text similarity method research by combining semantic analysis with statistics
语义分析与词频统计相结合的中文文本相似度量方法研究*

HUA Xiu-li,ZHU Qiao-ming,LI Pei-feng,
华秀丽,朱巧明,李培峰

Keywords: vector space model,semantic analysis,term frequency,probability distribution,text similarity
向量空间模型,语义分析,词频,概率分布,文本相似度

Full-Text Cite this paper Add to My Lib

Abstract:

Based on the statistical text similarity measurements method used TF-IDF method to model text documents as term frequency vectors,and computed similarity between documents by using cosine similarity.This method ignored semantic information of text documents,the similarity value wasn’t correct.Although based on semantics method made up for the drawback,but need of knowledge to construct the relationship between words.By studying the advantages and disadvantages of two kinds of methods,this paper presented a novel text similarity method,which firstly pre-processed text,then chose the terms with higher TF-IDF value as the feature items,next used semantic dictionary and TF-IDF method to compute the text similarity,finally used several K-means clustering methods for evaluating performance of the new text document similarity.Experimental results show that the method’s F-measure is superior to the others’ which proves that the proposed method is effective.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

Chinese text similarity method research by combining semantic analysis with statistics语义分析与词频统计相结合的中文文本相似度量方法研究*

Chinese text similarity method research by combining semantic analysis with statistics
语义分析与词频统计相结合的中文文本相似度量方法研究*