%0 Journal Article %T Chinese text similarity method research by combining semantic analysis with statistics
语义分析与词频统计相结合的中文文本相似度量方法研究* %A HUA Xiu-li %A ZHU Qiao-ming %A LI Pei-feng %A
华秀丽 %A 朱巧明 %A 李培峰 %J 计算机应用研究 %D 2012 %I %X Based on the statistical text similarity measurements method used TF-IDF method to model text documents as term frequency vectors,and computed similarity between documents by using cosine similarity.This method ignored semantic information of text documents,the similarity value wasn’t correct.Although based on semantics method made up for the drawback,but need of knowledge to construct the relationship between words.By studying the advantages and disadvantages of two kinds of methods,this paper presented a novel text similarity method,which firstly pre-processed text,then chose the terms with higher TF-IDF value as the feature items,next used semantic dictionary and TF-IDF method to compute the text similarity,finally used several K-means clustering methods for evaluating performance of the new text document similarity.Experimental results show that the method’s F-measure is superior to the others’ which proves that the proposed method is effective. %K vector space model %K semantic analysis %K term frequency %K probability distribution %K text similarity
向量空间模型 %K 语义分析 %K 词频 %K 概率分布 %K 文本相似度 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=A9D9BE08CDC44144BE8B5685705D3AED&aid=F950C016BC33E9012EA855FF25C79C61&yid=99E9153A83D4CB11&vid=771469D9D58C34FF&iid=38B194292C032A66&sid=07034C6B9EA4A53C&eid=BC60A9A1D91963F5&journal_id=1001-3695&journal_name=计算机应用研究&referenced_num=0&reference_num=13