全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

A New Indexing Method Based on Word Proximity for Chinese Text Retrieval

Keywords: information retrieval,vector space model,automatic indexing,proximity-based indexing
电子汉语
,定向语言文件,信息恢复

Full-Text   Cite this paper   Add to My Lib

Abstract:

This paper proposed a novel text representation and matching scheme for Chinese text retrieval. At present, the indexing methods of Chinese retrieval systems are either character-based or word-based. The character-based indexing methods, such as bi-gram or tri-gram indexing, have high false drops due to the mismatches between queries and documents. On the other hand, it's difficult to efficiently identify all the proper nouns, terminology of different domains, and phrases in the word-based indexing systems. The new indexing method uses both proximity and mutual information of the word pairs to represent the text content so as to overcome the high false drop, new word and phrase problems that exist in the character-based and word-based systems. The evaluation results indicate that the average query precision of proximity-based indexing is 5.2% higher than the best results of TREC-5.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133