全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于MapReduce的语义网空间数据关联
A Map-Reduce-Based Parallel Approach for Geospatial Data Interlinking in a Semantic Web

DOI: 10.12677/GST.2019.72014, PP. 90-100

Keywords: Map Reduce,数据关联,地理空间数据,Hausdorff距离
Map-Reduce
, Data Interlinking, Geospatial Semantic Data, Hausdorff Distance

Full-Text   Cite this paper   Add to My Lib

Abstract:

构建数据网是实现语义网的一种途径,而关联不同的RDF数据集是构建数据网中的重要问题。在RDF关联中,同质关联是一种重要类型,旨在匹配来自不同数据集中的相同实体。构建地理空间实体之间的同质关联有许多方法,本文采用了基于相似性的关联方法,使用Hausdorff距离计算两个实体之间的位置和形状相似度。由于Hausdorff距离的计算十分复杂并且地理空间数据具有大数据的特性,因此整个匹配过程非常耗时。本文提出了一种基于MapReduce框架的并行计算方法,大大减少了运行时间。实验对欧洲领土数据库(NUTS)和全球行政区划数据库(GADM)中的数据进行了同质关联。关联结果精度高,在1个节点上运行时,运行时间超过了一天,而利用拟议的并行框架,在8个节点上运行时间仅3小时左右。
The Web of Data represents an intermediate step towards the Semantic Web. Constructing links among different Resource Description Framework (RDF) datasets is a key issue in the Web of Data. An identity link aims to match entities from different datasets and is an important type of RDF link. There are many approaches to constructing identity links between geospatial entities. This paper adopts the Hausdorff distance to compute the location and shape similarity between two entities. Because the computation of the Hausdorff distance is complex and geospatial data intrinsically large, the entire matching process is very time consuming. This paper proposes a Map-Reduce-based framework to parallelize the similarity computation, significantly reducing the runtime. This approach was verified to be effective in an experiment using data from Nomenclature of Territorial Units for Statistics (NUTS) and Database of Global Administrative Areas (GADM). The matching precision was high, and with the utilization of the proposed parallel framework, the runtime was reduced to only approximately 3 h on 8 nodes; in contrast, when run on 1 node, the runtime exceeded one day.

References

[1]  Auer, S., et al. (2007) DBpedia: A Nucleus for a Web of Open Data. Proceedings of 6th International Semantic Web Conference and 2nd Asian Semantic WEB Conference, Busan, 11-15 November 2007, 722-735.
https://doi.org/10.1007/978-3-540-76298-0_52
[2]  Auer, S., Lehmann, J. and Hellmann, S. (2009) Linked Geo Data: Adding a Spatial Dimension to the Web of Data. Proceedings of International Semantic Web Conference, Chantilly, 25-29 October 2009, 731-746.
[3]  Mika, P. and Tummarello, G. (2008) Web Semantics in the Clouds. IEEE Intelligent Systems, 23, 82-87.
https://doi.org/10.1109/MIS.2008.94
[4]  Hoffart, J., et al. (2013) YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence, 194, 28-61.
https://doi.org/10.1016/j.artint.2012.06.001
[5]  Berners-Lee, T. (2006) Linked Data.
http://www.w3.org/DesignIssues/LinkedData.html
[6]  Heath, T. and Bizer, C. (2011) Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool, San Rafael.
[7]  Winkler, W.E. (1990) String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, Alexandria, 354-359.
[8]  Rodriguez, M.A. and Egenhofer, M.J. (2003) Determining Semantic Similarity among Entity Classes from Different Ontologies. IEEE Transactions on Knowledge and Data Engineering, 15, 442-456.
https://doi.org/10.1109/TKDE.2003.1185844
[9]  Varelas, G., et al. (2005) Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web. In: Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, ACM, New York, 10-16.
https://doi.org/10.1145/1097047.1097051
[10]  Nguyen, H.A. and Al-Mubaid, H. (2006) A Combination-Based Semantic Similarity Measure Using Multiple Information Sources. IEEE International Conference on Information Reuse and Integration, 16-18 September 2006, 617-621.
[11]  Ge, J. and Qiu, Y. (2008) Concept Similarity Matching Based on Semantic Distance. 4th International Conference on Semantics, Knowledge and Grid, 3-5 December 2008, 380-383.
https://doi.org/10.1109/SKG.2008.24
[12]  Tejada, S., Knoblock, C.A. and Minton, S. (2001) Learning Object Identification Rules for Information Integration. Information Systems, 26, 607-633.
https://doi.org/10.1016/S0306-4379(01)00042-4
[13]  Cohen, W.W., Ravikumar, P. and Fienberg, S.E. (2003) A Comparison of String Metrics for Matching Names and Records. KDD Workshop on DATA Cleaning & Object Con-solidation, Washington, DC, Vol. 3, 73-78.
[14]  Zhang, M., et al. (2013) An Interlinking Approach for Linked Geo-spatial Data. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 40, 283-287.
https://doi.org/10.5194/isprsarchives-XL-7-W2-283-2013
[15]  Tversky, A. (1977) Features of Similarity. Psychological Review, 84, 327-352.
https://doi.org/10.1037/0033-295X.84.4.327
[16]  Pschorr, J., et al. (2010) Sensor Discovery on Linked Data. Proceedings of the 7th Extended Semantic Web Conference, Heraklion.
[17]  Volz, J., et al. (2010) Silk—A Link Discovery Framework for the Web of Data. LDOW, 538.
[18]  Bizer, C., Cyganiak, R. and Heath, T. (2007) How to Publish Linked Data on the Web.
http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/LinkedDataTutorial/
[19]  Dean, J. and Ghemawat, S. (2004) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51, 107-113.
https://doi.org/10.1145/1327452.1327492

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133