%0 Journal Article
%T Analysis and Improvement of Statistics-Based Chinese Part-of-Speech Tagging
基于统计的汉语词性标注方法的分析与改进
%A WEI Ou
%A WU Jian
%A SUN Yu-fang
%A
魏欧
%A 吴健
%A 孙玉芳
%J 软件学报
%D 2000
%I
%X In this paper, a popular statistics\|based training and tagging method for Chinese texts is studied, and the nonlinear relation between training set and tagging accuracy is analyzed from the aspects of the structure and numerical value of the matrix of transition probabilities and the matrix of symbol probabilities. In order to make use of training corpus sufficiently and get the higher tagging accuracy, the training and tagging method is improved from two aspects: using other grammatical attributes of words, and strengthening the processing of unknown words. With the improved method, open test and close test showed that the overall accuracies are about 96.5% and 96% respectively.
%K Part-of-Speech tagging
%K n-gram
%K corpus
%K grammatical attribute
词性标注
%K n元语法
%K 语料
%K 语法属性.
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=D6ABDE4B11990F32&yid=9806D0D4EAA9BED3&vid=708DD6B15D2464E8&iid=E158A972A605785F&sid=8143FF92EEF26F96&eid=03436AC72A659ACA&journal_id=1000-9825&journal_name=软件学报&referenced_num=15&reference_num=7