全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于Boosting的半结构化信息抽取

Keywords: Boosting算法,抽取规则,半结构化文本

Full-Text   Cite this paper   Add to My Lib

Abstract:

为了对半结构化文本实现自动抽取信息.介绍了一种基于Boosting算法的信息抽取方法,它能够自动对一个训练例生成规则,将该规则应用于正例集并改变正例集权重分布,找到权重最大的正例生成下一条规则.给出了一种能描述不符合英文词法的词的模式匹配约束.试验表明:在特征简单的抽取规则学习中,该方法精确度与召回率可达100%.在特征比较复杂的抽取规则学习中,该方法F1评估值也能达到80%以上.

References

[1]  SUN A, NAING M M, LIM E P. Using support vector machines for terrorism information extraction[A]. Intelligence andSecurity Informatics: First NSF/NIJ Symposium[C]. Heidelberg: Springer-Verlag. 2003.1-12.
[2]  PESHKIN L, PFEFFER A. Bayesian information extraction network[A]. Proceedings of the Eighteenth International JointConference Artificial Intelligence[C]. CA, USA: Morgan Kaufmann, 2003.421-426.
[3]  CALIFF M E, MOONEY R J. Relational learning of pattern-match rules for information extraction[Z]. Working Papers ofthe ACL-97 Workshop in Natural Language Learning, Madrid, Spain, 1997.
[4]  SODERLAND S. Learning information extraction rules for semi-structured and free text[J]. Machine Learning, 1999,34(1-3):233-272.
[5]  FREITAG D, KUSHMERICK N. Boosted wrapper induction[A]. Proceedings of the Seventeenth National Conference onArtificial Intelligence[C]. California: AAAI Press, 2000.577-583.
[6]  ZHANG H P, LIU Q, CHENG X Q, et al. Chinese lexical analysis using hierarchical hidden markov model[Z]. 2nd SigHanWorkshop, Sapporo, Japan, 2003.
[7]  ZHANG H P, YU H K, XIONG D Y, et al. HHMM-based Chinese lexical analyzer ICTCLAS[Z]. 2nd SigHan Workshop,Sapporo, Japan, 2003.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133