OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

软件学报 1998

Part of Speech Tagging Chinese Corpus Based on Statistics and Rules
统计与规则并举的汉语词性自动标注算法

ZHANG Min,LI Sheng,ZHAO Tie-jun,ZHANG Yan-feng,
张民,李生,赵铁军,张艳风

Keywords: Chinese,part of speech tagging,hidden Markov model,rule,confidence intervals
汉语,词性标注,隐马尔可夫模型,规则,置信区间.

Full-Text Cite this paper Add to My Lib

Abstract:

This paper proposes an algorithm of automaticallytagging the POS(part of speech) of Chinese words which is based on integration of the statistical technique and the rule technique with the priority of the quantitative statistical analysis. The confidence intervals in the estimation of parameters is employed in the algorithm, and this makes the high-accuracy quantitative statistical technique as the top priority of tagging a corpus. Then the untagging part of the corpus is tagged in terms of rules, and some errors by statistics can be corrected by rules. Both closed and opened tests indicated that the accuracies of the algorithm are 98.9% and 98.1% respectively without consideration of both unknown words and segmentation errors.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

Part of Speech Tagging Chinese Corpus Based on Statistics and Rules统计与规则并举的汉语词性自动标注算法

Part of Speech Tagging Chinese Corpus Based on Statistics and Rules
统计与规则并举的汉语词性自动标注算法