OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

北京工业大学学报 2005

基于Boosting的半结构化信息抽取

Keywords: Boosting算法,抽取规则,半结构化文本

Full-Text Cite this paper Add to My Lib

Abstract:

为了对半结构化文本实现自动抽取信息.介绍了一种基于Boosting算法的信息抽取方法,它能够自动对一个训练例生成规则,将该规则应用于正例集并改变正例集权重分布,找到权重最大的正例生成下一条规则.给出了一种能描述不符合英文词法的词的模式匹配约束.试验表明:在特征简单的抽取规则学习中,该方法精确度与召回率可达100％.在特征比较复杂的抽取规则学习中,该方法F1评估值也能达到80％以上.

References

[1]	SUN A, NAING M M, LIM E P. Using support vector machines for terrorism information extraction[A]. Intelligence andSecurity Informatics: First NSF/NIJ Symposium[C]. Heidelberg: Springer-Verlag. 2003.1-12.
[2]	PESHKIN L, PFEFFER A. Bayesian information extraction network[A]. Proceedings of the Eighteenth International JointConference Artificial Intelligence[C]. CA, USA: Morgan Kaufmann, 2003.421-426.
[3]	CALIFF M E, MOONEY R J. Relational learning of pattern-match rules for information extraction[Z]. Working Papers ofthe ACL-97 Workshop in Natural Language Learning, Madrid, Spain, 1997.
[4]	SODERLAND S. Learning information extraction rules for semi-structured and free text[J]. Machine Learning, 1999,34(1-3):233-272.
[5]	FREITAG D, KUSHMERICK N. Boosted wrapper induction[A]. Proceedings of the Seventeenth National Conference onArtificial Intelligence[C]. California: AAAI Press, 2000.577-583.
[6]	ZHANG H P, LIU Q, CHENG X Q, et al. Chinese lexical analysis using hierarchical hidden markov model[Z]. 2nd SigHanWorkshop, Sapporo, Japan, 2003.
[7]	ZHANG H P, YU H K, XIONG D Y, et al. HHMM-based Chinese lexical analyzer ICTCLAS[Z]. 2nd SigHan Workshop,Sapporo, Japan, 2003.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133