|
Web数据转换模式映射优化方法
|
Abstract:
Web数据转换是Web异构数据源集成的重要研究之一,通常分为实例层和模式层两方面进行。本文的研究主要针对模式层,由于给定的源到目标模式映射通常使数据转换结果包含大量冗余,为了生成不含冗余的数据作为数据转换核解,本文设计了一种基于同态关系的模式映射设计与优化方法。该方法首先引入模式映射之间的同态关系作为模式映射重写方法基础,通过对模式映射进行分解,定义不同规则生成的数据冗余的大小程度,确定需要重写的规则。最后将给定的模式映射重写为能够直接生成核解的核模式映射,并将其转换为可执行的SQL语句来计算核解。本文实验使用来自中国土地市场网的数据验证本文方法的有效性。
Web data exchange is one of the important researches on the integration of Web heterogeneous data sources. It is usually divided into two aspects: instance layer and schema layer. The research in this paper is mainly focused on the mode layer. Because a given source-to-target mode mapping usually makes the data exchange results contain a lot of redundancy, in order to generate data without redundancy as a data exchange kernel solution, this paper designs a homomorphic rela-tionship Schema mapping design and optimization methods. This method first introduces the ho-momorphic relationship between the schema mappings as the basis of the schema mapping re-writing method. By decomposing the schema mappings, defining the degree of data redundancy generated by different rules, and determining the rules that need to be rewritten. Finally, the given schema mapping is rewritten into a kernel schema mapping that can directly generate a kernel so-lution, and it is converted into an executable SQL statement to calculate the kernel solution. This paper uses data from China Land Market Network to test the performance of the proposed method.
[1] | Fagin, R., Kolaitis, P.G., et al. (2003) Data Exchange: Semantics and Query Answering. In: Database Theory—ICDT 2003, Springer, Berlin, Heidelber, 207-224.
https://doi.org/10.1007/3-540-36285-1_14 |
[2] | Fagin, R., Kolaitis, P.G. and Popa, L. (2005) Data Exchange: Getting to the Core. ACM Transactions on Database Systems, 30, 174-210.
https://doi.org/10.1145/1061318.1061323 |
[3] | Pichler, R. and Savenkov, V. (2010) Towards Practical Feasibility of Core Computation in Data Exchange. Theoretical Computer Science, 411, 935-957.
https://doi.org/10.1016/j.tcs.2009.09.035 |
[4] | Kimmig, A., Memory, A., Miller, R.J. and Getoor, L. (2017) A Collective, Probabilistic Approach to Schema Mapping. 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, CA, 19-22 April 2017, 921-932.
https://doi.org/10.1109/ICDE.2017.140 |
[5] | Gottlob, G. and Nash, A. (2006) Data Exchange: Computing Cores in Polynomial Time. ACM Sigmod-Sigact-Sigart Symposium on Principles of Database Systems, June 2006, 40-49.
https://doi.org/10.1145/1142351.1142358 |
[6] | Yousfi, A., Elyazidi, M.H. and Zellou, A. (2018) Assessing the Performance of a New Semantic Similarity Measure Designed for Schema Matching for Mediation Systems. In: International Conference on Computational Collective Intelligence, Springer, Cham, 64-74.
https://doi.org/10.1007/978-3-319-98443-8_7 |
[7] | Kettouch, M., Luca, C. and Hobbs, M. (2017) Schema Matching for Semi-structured and Linked Data. 2017 IEEE 11th International Conference on Semantic Computing (ICSC), San Diego, CA, 30 January-1 February 2017, 270-271.
https://doi.org/10.1109/ICSC.2017.104 |
[8] | Sekhavat, Y.A. and Parsons, J. (2017) SEDEX: Scalable Entity Preserving Data Exchange. 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, CA, 19-22 April 2017, 65-66.
https://doi.org/10.1109/ICDE.2017.39 |
[9] | Alexe, B., ten Cate, B., Kolaitis, P.G. and Tan, W. (2011) EIRENE: Interactive Design and Refinement of Schema Mappings via Data Examples. Proceedings of the VLDB Endowment, 4, 1414-1417.
https://doi.org/10.1145/2043652.2043656 |
[10] | Alexe, B., Hernndez, M., Popa, L. and Tan, W.C. (2012) MapMerge: Correlating Independent Schema Mappings. The VLDB Journal, 21, 191-211.
https://doi.org/10.1007/s00778-012-0264-z |
[11] | 解筱, 张克, 任伯群, 等. ETL技术在商业银行数据整合中的研究与应用[J]. 信息技术与信息化, 2019(7): 45-47. |
[12] | 丁强龙, 王津, 张学杰. 基于子模式的关系数据到图数据ETL方法研究[J]. 计算机工程与应用, 2017, 53(12): 76-84. |
[13] | 李磊. ETL任务集群调度方法[J]. 计算机技术与发展, 2018, 28(11): 41-44. |
[14] | Baker, C.A. (1995) Extended Skolem Sequences. Journal of Combinatorial Designs, 3, 363-379.
https://doi.org/10.1002/jcd.3180030507 |
[15] | Ravichandra, S. and Somayajulu, D.V.L.N. (2015) Core Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange. |
[16] | 吕劲松, 王忠. 金融审计中的数据分析[J]. 审计研究, 2014(5): 28-33. |
[17] | Fan, H., Deng, K. and Liu, J. (2016) An Approach of XML Schema Matching Using Top-K Mapping. 2016 3rd International Conference on Information Science & Control Engineering (ICISCE), Beijing, 8-10 July 2016, 174-178.
https://doi.org/10.1109/ICISCE.2016.47 |
[18] | Hsu, I.C., Yang, L.J., Huang, D.C., et al. (2014) Integrating Semantic Web Technologies with XML Schema Using Role-Mapping Annotations. The Electronic Library, 32, 147-169.
https://doi.org/10.1108/EL-07-2012-0096 |