全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2017 

基于条件随机场的词结构分析方法 Title Word Structure Analysis Based on Conditional Random Fields

Keywords: 词结构分析,条件随机场,词结构特征,词结构标记集

Full-Text   Cite this paper   Add to My Lib

Abstract:

传统的基于词边界划分的中文分词已经难以满足实际应用的需要.通过深入分析词的内部结构,提出了一种基于条件随机场的词结构分析方法.根据伪未登录词(POOV)组成成分的特点及词的内部子结构表示,提出了词的结构特征来提高未登录词(OOV)的识别率及词结构的识别性能.从词的内部结构表示形式推导出一般化的词结构标记集,很好地统一了词边界标记和词内部结构标记.它不仅适用于传统的中文分词任务中词边界的标注,而且也适用于词结构分析任务中词的内部结构的标注.该方法能够同时分析得到词的边界和内部结构信息,解决了语料库之间分词标准不一致的问题,满足了应用的不同需求.实验结果表明,该方法在整体性能和各层次结构的识别上都比现有方法有所提高

References

[1]  ZHANG M,ZHANG Y,CHE W,et al.Chinese Parsing Exploiting Characters[DB/OL].[2016-04-03].http://aclweb.org/anthology/P13-1013.
[2]  孙静,方艳,丁彬,等.利用扩展标记集的词结构分析[J].中文信息学报,2014,28(5):39-45.DOI:10.3969/j.issn.1003-0077.2014.05.005.SUN J,FANG Y,DING B,et al.A word structure analysis by extending the word tag set[J].Journal of Chinese Information Processing,2014,28(5):39-45.DOI:10.3969/j.issn.1003-0077.2014.05.005(Ch).
[3]  OKAZAKI N.CRFsuite:A Fast Implementation of Conditional Random Fields(CRFs)[DB/OL].[2008-03-05]http://www.chokkan.org/software/crfsuite.
[4]  孟凡东,徐金安,姜文斌,等.异种语料融合方法:基于统计的中文词法分析应用[J].中文信息学报,2012,26(2):3-8.DOI:10.3969/j.issn.1003-0077.2012.02.001.MENG F D,XU J A,JIANG W B,et al.A method of merging corpora in different annotation standards:An application statistics Chinese lexical analysis[J].Journal of Chinese Information Processing,2012,26(2):3-8.DOI:10.3969/j.issn.1003-0077.2012.02.001(Ch).
[5]  WU A.Customizable segmentation of morphologically derived words in Chinese[J].Int Journal of Computational Linguistics and Chinese Language Processing,2003,8(1):1-27.
[6]  ZHAO H.Character-level Dependencies in Chinese:Usefulness and Learning[DB/OL].[2016-03-09].http://aclweb.org/anthology/E09-1100.DOI:10.3115/1609067.1609165.
[7]  DONG Z,DONG Q,HAO C.Word Segmentation Needs Change-From A Linguist’s View[DB/OL].[2016-04-09].http://www.aclweb.org/anthology/W/W10/W10-4101.pdf.
[8]  方艳,周国栋.基于层叠CRF模型的词结构分析[J].中文信息学报,2015,29(4):1-7.DOI:10.3969/j.issn.1003-0077.2015.04.001.FANG Y,ZHOU G D.Word structure analysis based on cascaded CRFs[J].Journal of Chinese Information Processing,2015,29(4):1-7.DOI:10.3969/j.issn.1003-0077.2015.04.001(Ch).
[9]  JIANG W,HUANG L,LIU Q.Automatic Adaptation of Annotation Standards:Chinese Word Segmentation and POS Tagging---A Case Study[DB/OL].[2016-04-03].http://aclweb.org/anthology/P09-1059.DOI:10.3115/1687878.1687952.
[10]  CHENG F,DUH K,MATSUMOTO Y.Synthetic Word Parsing Improves Chinese Word Segmentation[DB/OL].[2016-03-12].http://aclweb.org/anthology/P15-2043.DOI:10.3115/v1/P15-2043.
[11]  LAFFERTY J,MCCALLUM A,PEREIRA F.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 8th International Conference on Machine Learning.Williamstown:Morgan Kaufmann,2001:282-289.
[12]  LI Z.Parsing the Internal Structure of Words:A New Paradigm for Chinese Word Segmentation[DB/OL].[2016-02-23].http://aclweb.org/anthology/P11-1141.
[13]  ZHANG K,WANG C,SUN M.Binary tree based Chinese word segmentation[J].Decision Support Systems,2013,46(1):149-157.
[14]  SUN W,XU J.Enhancing Chinese Word Segmentation Using Unlabeled Data[DB/OL].[2016-03-16].http://aclweb.org/anthology/D11-1090.
[15]  张梅山,邓知龙,车万翔,等.统计与词典相结合的领域自适应中文分词[J].中文信息学报,2012,26(2):8-13.DOI:10.3969/j.issn.1003-0077.2012.02.002.ZHANG M S,DENG Z L,CHE W X,et al.Combining statistical model and dictionary for domain adaption of Chinese word segmentation[J].Journal of Chinese Information Processing,2012,26(2):8-13.DOI:10.3969/j.issn.1003-0077.2012.02.002(Ch).
[16]  ZHAO H,HUANG C N,LI M,et al.Effective Tag Set Selection in Chinese Word Segmentation via Conditional Random Field Modeling[DB/OL].[2016-03-03].http://aclweb.org/anthology/Y06-1012.
[17]  CHENG F,DUH K,MATSUMOTO Y.Parsing Chinese Synthetic Words with a Character-Based Dependency Model[DB/OL].[2016-02-06].http://www.lrec-conf.org/proceedings/lrec2014/pdf/96_Paper.pdf.
[18]  ZHANG L,WANG H,SUN X.Exploring Representations from Unlabeled Data with Co-Training for Chinese Word Segmentation[DB/OL].[2016-02-03].http://aclweb.org/anthology/D13-1031.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133