%0 Journal Article %T 基于LDA及标签传播的实体集合扩展<br>Entity set expansion based on LDA and label propagation %A 马宇峰 %A 阮彤< %A br> %A MA Yu-feng %A RUAN Tong %J 山东大学学报(理学版) %D 2015 %R 10.6040/j.issn.1671-9352.3.2014.101 %X 摘要: 实体集合扩展是指给定某类别下若干示例作为种子,扩展得到属于该类别下的更多实体。传统的实体集合扩展方法主要考虑实体之间的共现关系,根据它们之间的相似程度进行迭代式的扩展,但这会导致语义偏转问题的出现,准确率较差。对此,提出了先根据LDA(latent dirichlet allocation)主题模型获得种子词集合语义信息,再通过标签传播来进行实体集合扩展的方法。该方法通过考虑实体列表整体蕴含的语义信息,避免了单个词可能带来的歧义问题;利用LDA模型,挖掘实体列表的上下文主题,丰富实体扩展过程中的语义信息,解决语义偏转问题。在实际数据集上取得了良好的检测效果,证明了本文方法的有效性。<br>Abstract: Set expansion refers to expanding a partial set of "seed" objects into a more complete set. A widely employed approach to set expansion is based on iterative bootstrapping, which can be applied with only small amounts of supervision and which scales bad to very large corpus. A well-known problem with iterative bootstrapping is a phenomenon known as semantic drift: as bootstrapping proceeds it is likely that unreliable patterns will lead to false extractions. To address this issue, a hybrid method for entity set expansion was proposed based on LDA and label propagation. The whole entities in an entity list were considered to prevent words ambiguity; and the LDA used model to mine semantic information in contexts between entity lists to resolve the semantic drift phenomenon. Experiments were conducted with some datasets, and the evaluation demonstrates the effectiveness, efficiency, and scalability of the proposed solution %K 实体集合扩展 %K 标签传播 %K LDA %K 种子词 %K 主题模型 %K < %K br> %K topic model %K seed %K LDA %K label propagation %K entity set expansion %U http://lxbwk.njournal.sdu.edu.cn/CN/10.6040/j.issn.1671-9352.3.2014.101