全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
软件学报  2006 

Information Retrieval Oriented Adaptive Chinese Word Segmentation System
面向信息检索的自适应中文分词系统

Keywords: word segmentation system,word segmentation algorithm,information retrieval,new word recognition,disambiguation
分词系统
,分词算法,信息检索,新词识别,歧义消解

Full-Text   Cite this paper   Add to My Lib

Abstract:

New words recognition and ambiguity resolving have vital effect on information retrieval precision. This paper presents a statistical model based algorithm for adaptive Chinese word segmentation. Then, a new word segmentation system called BUAASEISEG is designed and implemented using this algorithm. BUAASEISEG can recognize new words in various domains and do disambiguation and segment words with arbitrary length. It uses an iterative bigram method to do word segmentation. Through online statistical analysis on target article and using the offline words frequencies dictionary or the inverted index of the search engine, the candidate words selection and disambiguation are done. On the basis of the statistical methods, post-process using stopwords list, quantity suffix words list and surname list are used for further precision improvement. The comparative evaluation with the famous Chinese word segmentation system ICTCLAS, using news and papers as testing text, shows that BUAASEISEG outperforms ICTCLAS in new words recognition and disambiguation.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133