全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

带有范例元组的交互式数据转换映射方法研究
Research on Interactive Data Conversion Mapping Method with Example Tuples

DOI: 10.12677/HJDM.2021.112009, PP. 84-99

Keywords: Web大数据,数据集成,数据转换,模式映射,布尔查询
Web Big Data
, Data Integration, Data Exchange, Schema Mapping, Boolean Query

Full-Text   Cite this paper   Add to My Lib

Abstract:

模式映射是Web异构大数据集成的重要研究内容之一,通常包含实例层和模式层两方面的研究,本文的研究重点主要集中在模式层。要想在短时间内完全掌握这门技术并且加以运用,这对于那些不熟悉模式转换所涉及的转换语义和语言的非专家用户来说几乎是不可能的。因此,本文在已有的关于数据转换研究成果的基础之上提出了一个适用于非专家用户的交互式模式映射设计框架系统。首先,对由非专家用户提供的不完整的表达性较差的数据转换范例元组进行预处理。然后,再通过简单的用户交互递归地对初始范例元组的有效性进行布尔查询从而得到最终映射规则。其次,本文提出了两种探索所有数据转换映射空间的策略以满足任意用户范例元组。在探索过程中系统会根据与用户交互的结果来保留最适合用户需求的规则,并动态地剪枝搜索空间从而减少与用户交互的次数,本文实验采用来自中国土地市场网的数据集成转换来验证本文方法的有效性。
Schema mapping is one of the important research contents of heterogeneous big data integration on Web, which usually includes two aspects: instance layer and schema layer. The focus of this paper is mainly on schema layer. It is almost impossible for non-expert users who are not familiar with the semantics and language involved in schema transformation to master this technology and apply it in a short time. Therefore, based on the existing research results on data conversion, this paper proposes an interactive schema mapping design framework system for non-expert users. Firstly, the incomplete data transformation paradigm tuples with poor expressiveness provided by non-expert users are preprocessed. Then, the validity of the initial example tuple is recursively queried by simple user interaction, and the final mapping rules are obtained. Secondly, this paper proposes two strategies to explore the mapping space of all data transformations to satisfy any user paradigm tuple. In the process of exploration, the system will keep the rules that are most suitable for users’ needs according to the results of interaction with users, and prune the search space dy-namically to reduce the number of interactions with users. In this experiment, the data integration transformation from China Land Market Network is used to verify the effectiveness of this method.

References

[1]  纪宇航, 李贵, 李征宇, 韩子扬, 曹科研. Web数据转换模式映射优化方法[J]. 数据挖掘, 2020, 10(1): 76-89.
https://doi.org/10.12677/HJDM.2020.101008
[2]  Fagin, R., Kolaitis, P.G., Miller, R.J. and Popa, L. (2005) Data Exchange: Semantics and Query Answering. Theoretical Computer Science, 336, 89-124.
https://doi.org/10.1016/j.tcs.2004.10.033
[3]  杨雪梅, 董逸生, 王永利, 钱江波, 钱刚. 异构数据源集成中的模式映射技术[J]. 计算机科学, 2006, 11(7): 1-5.
[4]  Kimmig, A., Memory, A., Miller, R.J. and Getoor, L. (2017) A Collective, Probabilistic Approach to Schema Mapping. 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, 19-22 April 2017, 915-932.
https://doi.org/10.1109/ICDE.2017.140
[5]  Bernstein, P.A. and Melnik, S. (2007) Model Management 2.0: Ma-nipulating Richer Mappings. SIGMOD’07: Proceedings of the 2007 ACM SIGMOD International Conference on Man-agement of Data, June 2007, 1-12.
https://doi.org/10.1145/1247480.1247482
[6]  Shvaiko, P. and Euzenat, J. (2005) A Survey of Schema-Based Matching Approaches. In: Spaccapietra, S., Ed., Journal on Data Semantics IV. Lecture Notes in Computer Science, Vol. 3730, Springer, Berlin, Heidelberg, 140-171.
https://doi.org/10.1007/11603412_5
[7]  Bonifati, A., Bellahsene, Z. and Rahm, E., Eds. (2011) Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, Berlin.
https://doi.org/10.1007/978-3-642-16518-4
[8]  Alexe, B., Chiticariu, L., Miller, R.J. and Tan, W.C. (2008) Muse: Mapping Understanding and Design by Example. In 2008 IEEE 24th International Conference on Data Engineering, Cancun, 7-12 April 2008, 10-19.
https://doi.org/10.1109/ICDE.2008.4497409
[9]  Alexe, B., ten Cate, B., Kolaitis, P.G. and Tan, W.C. (2011) De-signing and Refifining Schema Mappings via Data Examples. SIGMOD’11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, June 2011, 130-135.
https://doi.org/10.1145/1989323.1989338
[10]  Gottlob, G. and Senellart, P. (2010) Schema Mapping Discovery from Data Instances. Journal of the ACM, 57, 6.
https://doi.org/10.1145/1667053.1667055
[11]  Jagadish, H.V., Chapman, A., Elkiss, A., Jayapandian, M., Li, Y., Nandi, A. and Yu, C. (2007) Making Database Systems Usable. SIGMOD’07: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, June 2007, 13-24.
https://doi.org/10.1145/1247480.1247483
[12]  Abouzied, A., Hellerstein, J.M. and Silberschatz, A. (2012) Playful Query Specification with Dataplay. Proceedings of the VLDB Endowment, 5, 1938-1941.
https://doi.org/10.14778/2367502.2367542
[13]  Papadimitriou, C.H., Abouzied, A., Angluin, D., Hellerstein, J.M. and Silberschatz, A. (2013) Learning and Verifying Quantified Boolean Queries by Example. PODS’13: Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, June 2013, 49-60.
[14]  Diaz, G.I., Arenas, M. and Benedikt, M. (2016) Sparqlbye: Querying RDF Data by Example. Proceedings of the VLDB En-dowment, 9, 1530-1535.
https://doi.org/10.14778/3007263.3007302
[15]  Franklin, M.J., Halevy, A.Y. and Maier, D. (2008) A First Tutorial on Dataspaces. Proceedings of the VLDB Endowment, 1, 1516-1519.
https://doi.org/10.14778/1454159.1454217
[16]  Popa, L., Velegrakis, Y., Miller, R.J., Hernández, M.A. and Fagin, R. (2002) Translating Web Data. VLDB’02: Proceedings of the 28th International Conference on Very Large Databases, Hong Kong, 20-23 August 2002, 598-609.
https://doi.org/10.1016/B978-155860869-6/50059-7
[17]  Francis, N. and Libkin, L. (2017) Schema Mappings for Data Graphs. Proceedings of the 36th ACM SIGMOD- SIGACT-SIGAI Symposium on Principles of Database Systems, May 2017, 389-401.
https://doi.org/10.1145/3034786.3056113
[18]  Deutch, D., Gilad, A. and Moskovitch, Y. (2018) Efficient Prove-nance Tracking for Datalog Using Top-K Queries. The VLDB Journal, 27, 245-269.
https://doi.org/10.1007/s00778-018-0496-7
[19]  赵雨蒙. 基于模式映射的异构数据集成模型研究[D]: [硕士学位论文]. 济南: 山东大学, 2010.
[20]  Gottlob, G., Pichler, R. and Savenkov, V. (2011) Normalization and Optimiza-tion of Schema Mappings. The VLDB Journal, 20, 277-302.
https://doi.org/10.1007/s00778-011-0226-x
[21]  Glavic, B., Arocena, P.C., Ciucanu, R. and Miller, R.J. (2015) The iBench Integration Metadata Generator. Proceedings of VLDB, 9, 108-120.
https://doi.org/10.14778/2850583.2850586

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133