%0 Journal Article
%T 基于MapReduce的语义网空间数据关联
A Map-Reduce-Based Parallel Approach for Geospatial Data Interlinking in a Semantic Web
%A 杨雯雨
%J Geomatics Science and Technology
%P 90-100
%@ 2329-7239
%D 2019
%I Hans Publishing
%R 10.12677/GST.2019.72014
%X 构建数据网是实现语义网的一种途径,而关联不同的RDF数据集是构建数据网中的重要问题。在RDF关联中,同质关联是一种重要类型,旨在匹配来自不同数据集中的相同实体。构建地理空间实体之间的同质关联有许多方法,本文采用了基于相似性的关联方法,使用Hausdorff距离计算两个实体之间的位置和形状相似度。由于Hausdorff距离的计算十分复杂并且地理空间数据具有大数据的特性,因此整个匹配过程非常耗时。本文提出了一种基于MapReduce框架的并行计算方法,大大减少了运行时间。实验对欧洲领土数据库(NUTS)和全球行政区划数据库(GADM)中的数据进行了同质关联。关联结果精度高,在1个节点上运行时,运行时间超过了一天,而利用拟议的并行框架,在8个节点上运行时间仅3小时左右。
The Web of Data represents an intermediate step towards the Semantic Web. Constructing links among different Resource Description Framework (RDF) datasets is a key issue in the Web of Data. An identity link aims to match entities from different datasets and is an important type of RDF link. There are many approaches to constructing identity links between geospatial entities. This paper adopts the Hausdorff distance to compute the location and shape similarity between two entities. Because the computation of the Hausdorff distance is complex and geospatial data intrinsically large, the entire matching process is very time consuming. This paper proposes a Map-Reduce-based framework to parallelize the similarity computation, significantly reducing the runtime. This approach was verified to be effective in an experiment using data from Nomenclature of Territorial Units for Statistics (NUTS) and Database of Global Administrative Areas (GADM). The matching precision was high, and with the utilization of the proposed parallel framework, the runtime was reduced to only approximately 3 h on 8 nodes; in contrast, when run on 1 node, the runtime exceeded one day.
%K Map Reduce,数据关联,地理空间数据,Hausdorff距离
Map-Reduce
%K Data Interlinking
%K Geospatial Semantic Data
%K Hausdorff Distance
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=29654