全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2013 

面向非结构化文本的开放式实体属性抽取
Open Entity Attribute-Value Extraction from Unstructured Text

Keywords: 属性抽取,非结构化,信息框,百度百科
属性抽取 非结构化 信息框 百度百科
,属性抽取 非结构化 信息框 百度百科,属性抽取 非结构化 信息框 百度百科,属性抽取 非结构化 信息框 百度百科

Full-Text   Cite this paper   Add to My Lib

Abstract:

从非结构化文本中抽取给定实体的属性及属性值,将属性抽取看作是一个序列标注问题.为避免人工标注训练语料,充分利用百度百科信息框(Infobox)已有的结构化内容,对非结构化文本回标自动产生训练数据.在得到训练语料后,结合中文特点,选取多维度特征训练序列标注模型,并利用上下文信息进一步提高系统性能,进而在非结构化文本中抽取出实体的属性及属性值.实验结果表明:该方法在百度百科多个类别中均有效;同时,该方法可以直接扩展到类似的非结构化文本中抽取属性.
An approach for extracting attribute-value pairs of a given entity has been proposed,regarding attribute-value extraction as a sequential data-labeling problem.In order to avoid label the corpus manually,the information in the Infoboxes of Baidu encyclopedia is used to label the unstructured text as the training data.After the training data was generated,multidimensional features are used to train the sequential data-labeling model,and then the performance is improved by using the context.Experiments shows that this method can be used in many classes of the Baidu encyclopedia,and this method can be also used in other websites

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133