全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于半CRF模型的百科全书文本段落划分

Keywords: 自然语言处理,机器学习,隐马尔科夫模型,文本段落划分,半条件随机域模型

Full-Text   Cite this paper   Add to My Lib

Abstract:

介绍了基于半条件随机域(semi-Markovconditionalrandomfields,简称semi-CRFs)模型的百科全书文本段落划分方法.为了克服单纯的HMM模型和CRF模型的段落类型重复问题,以经过整理的HMM模型状态的后验分布为基本依据,使用了基于词汇语义本体知识库的段落开始特征以及针对特定段落类型的提示性特征来进一步适应目标文本的特点.实验结果表明,该划分方法可以综合利用各种不同类型的信息,比较适合百科全书文本的段落结构,可以取得比单纯的HMM模型和CRF模型更好的性能.

References

[1]  MCCOLLUM A,FREITAG D,PEREIRA F.Maximum entropy markov models for information extraction and segmenta- tion[C]///Proceedings of ICML 2000.Stanford,California:Morgan Kaufmann Publishers lnc,2000:591-598.
[2]  JOHN L,ANDREW M,FERNANDO P.Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the International Conference on Machine Learning(ICML-2001).MA:Morgan Kauf- mann Publishers Inc,2001:282-289.
[3]  FEI S,FERNANDO P.Shallow parsing with conditional random fields[C]//Proceedings of HLT-NAACL.Edmonton, Canada:Association for Computational Linguistics,2003:134-141.
[4]  SUNITA S,WILLIAM W C.Semi-markov conditional random fields for information extraction[C/OL]//Proceedings of the Eighteenth Annual Conference on Neural Information Processing Systems.Vancouver:MIT Press,2004.http://citeseer. ist.psu.edu/653054,html
[5]  REYNARJ C.Topic segmentation:algorithms and applications[D].Philadelphia,USA:University of Pennsylvania, 1998.130-151.
[6]  MARTI A H.Multi-paragraph segmentation of expository text[C]//Proceedings of the 32nd Annual Meeting of the Associa- tion for Computational Linguistics.Las Cruces,New Mexico:Association for Computational Linguistics,1994:9-16.
[7]  CHRISTOPHER D M,HINRICH S.Foundations of statistical natural language processing[M].Cambridge,Massachusetts: MIT Press,1999:539-544.
[8]  YAMRON J,CARP I,GILLICK L,et al.A hidden markov model approach to text segmentation and event tracking[C]// Proceedings of the IEEE ICASSP.Seattle,Washington:Institute of Electrical and Electronics Engineers Signal Processing Society,1998:333-336.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133