%0 Journal Article
%T 一种基于树模型的关联实体解析方法
A Related Entity Resolution Algorithm Based on Tree Model
%A 王泽龙
%A 李贵
%A 李征宇
%A 韩子扬
%A 曹科研
%J Hans Journal of Data Mining
%P 241-252
%@ 2163-1468
%D 2021
%I Hans Publishing
%R 10.12677/HJDM.2021.114022
%X 在大数据时代,Web数据呈现多样性和关联性,在实体解析(Entity Resolution)中体现为解析的数据集往往包含多个实体集,实体集之间具有关联关系。这种关系导致解析一个实体集的结果可以使另一个实体集的解析受益,这种具有关联关系的实体解析称为关联实体解析(Related Entity Resolution)。本文针对一对多类型关联实体的实体解析问题提出了关联树模型,并引申出相似节点、相似树、相似性传递等概念。我们提出了一种基于树的一对多关联实体解析方法。初始时依据关联实体的关联关系构建关联树;将本节点的属性相似度和关联子节点的部分属性相似度结合起来判断节点是否匹配;基于深度优先原则遍历关联树的每一个节点,依据节点的实体解析结果筛选出满足相似传递性的部分子节点,在遍历完叶子节点的过程中,生成部分相似子树,再对根节点的子节点集中节点进行相似匹配,寻找其他相似子树。本文提出一种相似树索引来表示关联树的匹配结果。用房地产大数据通过实验验证文中提出的关联树搜索算法比已有的关联实体识别算法在一对多关联实体上效率更高。
In the era of big data, Web data is featured with obvious diversity and relevance. In Entity Resolution, it is reflected in the parsed data set that often contains multiple entity sets, and there is an association relationship between entities. Based on that relationship, the result of parsing one entity set can benefit the parsing of another entity set, and that kind of entity resolution with an associated relationship is called Related Entity Resolution. In this paper, focusing on the one-to-many types of related entities, the concept of relevance tree was proposed, and concepts such as similar nodes, similar trees, and similarity transfer were further derived. A relevance tree search algorithm was proposed in the study. Initially, a relevance tree was constructed according to the association relationship of the associated entities. Then, based on the depth-first principle, it traversed each node of the relevance tree, screened out some sub-nodes that meet the similarity transitivity based on the entity analysis results of the node, and continued to deepen the relevance tree until the leaf nodes were all traversed and subtrees of partial similarity were generated. After that, it matched the nodes in the sub-node set of the root node to find other similar subtrees. Based on the above, the similar tree index was proposed to represent the matching result of the relevance tree. It proposed a relevance tree merging algorithm, which merged nodes of the three types of relevance subtrees that may appear in the relevance tree based on the similarity tree index on the basis of maintaining the relationship between entities, and generated the “neat” relevance tree. Through experiments, it verified that compared with the existing related entity recognition algorithm, the relevance tree search algorithm proposed in the paper achieved higher efficiency on the one-to-many real estate related data set.
%K 关联实体,关联树,相似节点,相似树,实体解析
Related Entity
%K Related Tree
%K Similar Nodes
%K Similar Tree
%K Entity Resolution
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=46151