全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2017 

基于Aho-Corasick自动机算法的概率模型中文分词CPACA算法
A Probability Model Chinese Word Segmentation Algorithm Based on Aho-Corasick Automata Algorithm

DOI: 10.3969/j.issn.1001-0548.2017.02.018

Keywords: AC自动机,中文分词,动态规划,Trie树

Full-Text   Cite this paper   Add to My Lib

Abstract:

Aho-Corasick自动机算法是著名的多模式串匹配算法, 它在模式串失配时,通过fail指针转移至有效的后续状态,存在一个或多个有效的后续状态可能。据此特性,该文提出了一种适应于中文分词的自动机算法。该算法使用动态规划的方法,计算上下文匹配概率,转移至最佳的有效后续状态,即实现了基于字符串匹配的机械分词方法与基于统计概率模型的方法结合。实验结果表明,该算法分词准确率高。

References

[1]  罗桂琼, 费洪晓, 戴弋. 基于反序词典的中文分词技术研究[J]. 计算机技术与发展, 2008, (1):80-83. LUO Gui-qiong, FEI Hong-xiao, DAI Yi. Research of Chinese segmentation based on converse segmentation dictionary[J]. Computer Technology and Development, 2008(1):80-83.
[2]  MA Guo-jie, LI Xing-shan, RAYNER K. Word segmentation of overlapping ambiguous strings during chinese reading[J]. Journal of Experimental Psychology-Human Perception and Performance, 2014, 3(40):1046-1059.
[3]  来斯惟, 徐立恒, 陈玉博, 等. 基于表示学习的中文分词算法探究[J]. 中文信息学报, 2013, 5(27):8-14. LAI Si-wei, XU Li-heng, CHEN Yu-bo, et al. Chinese word segment based on character representation learning[J]. Journal of Chinese Information Processing, 2013, 5(27):8-14.
[4]  SUN X, ZHANG Y, MATSUZAKI T, et al. Probabilistic Chinese word segmentation with non-local information and stochastic training[J]. Information Processing & Management, 2013, 49(3):626-636.
[5]  HEWLETT D, COHEN P. Fully unsupervised word segmentation with bve and mdl[C]//Proceedings of ACL.[S.l.]:ACL, 2011:540-545.
[6]  WANG Han-shi, ZHU Jian, TANG Shi-ping, et al. A new unsupervised approach to word segmentation[J]. Computational Linguistics, 2011, 37(3):421-454.
[7]  温唱. 基于树形结构的中文分词方法的研究与实现[D]. 北京:华北电力大学, 2013. WEN Chang. Research and implementation of Chinese word segmentation based on tree structure[D]. Beijing:North China Electric Power University, 2013
[8]  JIANG W, GUAN Y, WANG X. A pragmatic Chinese word segmentation approach based on mixing models[J]. Association for Computational Linguistics and Chinese Language Processing, 2007, 11(4):393-416.
[9]  李玲. 基于双词典机制的中文分词系统设计[J]. 机械工程与自动化, 2013(1):17-19. LI Ling. Design of chinese word segmentation system based on dual-dictionary mechanism[J]. Mechanical Engineering & Automation, 2013(1):17-19.
[10]  马宁, 李亚超, 何向真, 等. 一种实用的资源稀缺条件下的分词方法[J]. 计算机应用研究, 2016, 33(1):68-70. MA Ning, LI Ya-chao, HE Xiang-zhen, et al. Practical approach of word segmentation in poor resource situation[J]. Application Research of Computers, 2016, 33(1):68-70.
[11]  THOMAS H, CORMEN C E L, RONALD L R, et al. 算法导论[M]. 3版. 殷建平, 徐云, 王刚, 等, 译. 北京:机械工业出版社, 2013. THOMAS H, CORMEN C E L, RONALDLR, et al. Introduction to algorithms[M]. 3rd. Translated by YIN Jian-ping, XU Yun, WANG Gang, et al. Beijing:China Machine Press, 2013.
[12]  王崇. 基于带词长的词典机制和规则判定的歧义消解算法的中文分词技术的研究[D]. 青岛:青岛科技大学, 2013 WANG Chong. Research on Chinese word segmentation technology with word length and rule algtithm[D]. Qingdao:Qingdao University of Science & Technology, 2013.
[13]  魏莎莎, 熊海灵. 中文分词中的歧义识别处理策略[J]. 微计算机信息, 2010, 30:190-192. WEI Sha-sha, XIONG Hai-ling. Ambiguity identification strategy of Chinese word segmentation[J]. Control & Automation, 2010, 30:190-192.
[14]  魏博诚, 王爱平, 沙先军, 等. 一种消除中文分词中交集型歧义的方法[J]. 计算机技术与发展, 2011, 21(5):60-63. WEI Bo-cheng, WANG Ai-ping, SHA Xian-jun, et al. A method about removing overlapping ambiguity producing in Chinese matching[J] Computer Technology and Development, 2011, 21(5):60-63.
[15]  姜芳, 李国和, 岳翔, 等. 基于粗分和词性标注的中文分词方法[J]. 计算机工程与应用, 2015, 51(6):204-207. JIANG Fang, LI Guo-he, YUE Xiang, et al. Segmentation of Chinese word based on method of rough segment and part of speech tagging[J]. Computer Engineering and Applications, 2015, 51(6):204-207.
[16]  ZHANG D Y, XU Y. Chinese word segmentation based on the first kind of spline weight function neural networks[J]. Applied Science Materials Science & Information Technologies in Industry, 2014, 513-517:683-686.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133