全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Chinese text similarity method research by combining semantic analysis with statistics
语义分析与词频统计相结合的中文文本相似度量方法研究*

Keywords: vector space model,semantic analysis,term frequency,probability distribution,text similarity
向量空间模型
,语义分析,词频,概率分布,文本相似度

Full-Text   Cite this paper   Add to My Lib

Abstract:

Based on the statistical text similarity measurements method used TF-IDF method to model text documents as term frequency vectors,and computed similarity between documents by using cosine similarity.This method ignored semantic information of text documents,the similarity value wasn’t correct.Although based on semantics method made up for the drawback,but need of knowledge to construct the relationship between words.By studying the advantages and disadvantages of two kinds of methods,this paper presented a novel text similarity method,which firstly pre-processed text,then chose the terms with higher TF-IDF value as the feature items,next used semantic dictionary and TF-IDF method to compute the text similarity,finally used several K-means clustering methods for evaluating performance of the new text document similarity.Experimental results show that the method’s F-measure is superior to the others’ which proves that the proposed method is effective.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133