全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

级联中文组块识别

DOI: 10.13190/jbupt.200801.14.qiny, PP. 14-17

Keywords: 中文组块,边界识别,类别识别,条件随机场

Full-Text   Cite this paper   Add to My Lib

Abstract:

基于统计方法的中文组块研究大多借鉴CoNLL2000英文组块的思想,建立了组块表示的BIO模型,并将组块识别任务作为一种为词序列标注的多分类问题.为降低分类复杂度,采取了一种分解识别法,即先识别组块的边界,再进行组块类别判定.基于条件随机场(CRF)构建了级联组块识别器,实验数据集采用宾州大学中文树库(CTB5.1).在特征选择上,借鉴了中文分词特征选择的方法.5倍交叉验证的实验结果为组块边界识别的F1值为95.05%;类型识别的准确率为99.43%;整体F1值为93.58%.该方法提高了系统性能,缩短了学习器的训练时间.

References

[1]  Abney S P. Parsing by chunks//Steven P, Abney, Carol Tenny. Principle-Based Parsing. MA: , 1991: 257-278.
[2]  Erik F, Tjong Kim Sang, Sabine Buchholz. Introduction to the CoNLL-2000 shared task:chunking //CoNLL-2000 and LLL-2000. Lisbon: , 2000: 127-132.
[3]  Sha Fei, Fernando C N, Pereira. Shallow parsing with conditional random fields//Edmonton Alberta. Human Language TechnologyNAACL. CA: , 2003: 213-220.
[4]  Sun Guanglu, Huang Changning, Wang Xiaolong, et al. Chinese chunking based on maximum entropy Markov models [J]. Computational Linguistics and Chinese Language Processing, 2006, 11(2): 115-136.
[5]  詹卫东. 面向中文信息处理的现代汉语短语结构规则研究. 北京: 北京大学, 1999. Zhan Weidong. A study of constructing rules of phrases in contemporary Chinese for Chinese information processing . Beijing: Peking University, 1999.
[6]  Chen Wenliang, Zhang Yujie, Isahara Hitoshi. An empirical study of Chinese chunking //Coling-ACL2006 (Poster Session). Sydney: , 2006: 97-104.
[7]  Tan Yongmei, Yao Tianshun, Chen Qing, et al. Applying conditional random fields to Chinese shallow parsing //Proceedings of CICLing 2005. Mexico City: Springer, 2005: 67-176.
[8]  李珩, 朱靖波, 姚天顺. 基于SVM的中文组块分析[J]. 中文信息学报, 2004, 18(2): 1-7. Li Heng, Zhu Jingbo, Yao Tianshun. SVM based Chinese text chunking [J]. Journal of Chinese Information Processing, 2004, 18(2): 1-7.
[9]  Lafferty John, McCallum Andrew, Fernando Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data//International Conference on Machine Learning (ICML01). San Francisco: Morgan Kaufmann, 2001: 282-289.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133