全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于示例编程的层次模型到关系模型的数据转换
Data Transformation from Hierarchical Model to Relational Model Based on Example Programming

DOI: 10.12677/HJDM.2022.124032, PP. 334-350

Keywords: Web数据集成,数据转换,层次模型,示例编程,Web Data Integration, Data Conversion, Hierarchical Model, Example Programming

Full-Text   Cite this paper   Add to My Lib

Abstract:

将多个数据源中的数据结合起来并统一存储,建立数据仓库的过程是web数据集成中的一个重要步骤。数据集成通过数据转换从而达到集成,主要解决数据的分布性和异构性的问题。许多应用程序使用层次结构存储和传输数据,这种基于树结构的层次模型非常适合底层数据,因此分层数据格式很流行用于导出数据并在不同应用程序之间传输数据。为了便于存储和查询通常需要将此类层次结构数据转换为关系表示,但由于层次结构数据和关系结构数据的特点以及需要处理的数据源可能很大,给这一转换过程带来了不少的工作量。为了解决这个问题,本文采用了一种基于示例编程的方法,用于将层次结构的文档迁移到关系格式。通过提出一种程序合成算法将合成关系表的任务分解为列提取和行提取这两个子任务,从输入输出示例学习目标转换,实现XML文档或JSON文档转换为关系表。实验结果表明,本文的方法可以为从层次结构数据到关系数据的转换任务生成所需的程序,实现数据集中的数据转换。
The process of combining and storing data from multiple data sources and establishing a data warehouse is an important step in web data integration. Data integration achieves integration through data transformation, and mainly solves the problems of data distribution and heterogeneity. Many applications store and transfer data using a hierarchical structure. This tree-based hierarchical model is well suited to the underlying data, so hierarchical data formats are popular for exporting data and transferring data between applications. In order to facilitate storage and query, it is usually necessary to convert such hierarchical data into relational representation. However, due to the characteristics of hierarchical data and relational data and the large data sources that need to be processed, this conversion process brings a lot of difficulties. workload. To address this issue, this paper adopts an example-based programming approach for migrating hierarchically structured documents to a relational format. By proposing a program synthesis algorithm, the task of synthesizing relational tables is decomposed into two sub-tasks of column extraction and row ex-traction, learning target conversion from input and output examples, and converting XML documents or JSON documents into relational tables. The experimental results show that the method in this paper can generate the required programs for the transformation task from hierarchical data to relational data, and realize the data transformation in the dataset.

References

[1]  Fagin, R., Kolaitis, P.G., Miller, R.J. and Popa, L. (2005) Data Exchange: Semantics and Query Answering. Theoretical Computer Science, 336, 89-124.
https://doi.org/10.1016/j.tcs.2004.10.033
[2]  Popa, L., Velegrakis, Y., Hernandez, M.A., Miller, R.J. and Fagin, R. (2002) Translating Web Data. Proceedings of the 28th International Conference on very Large Data Bases, Hong Kong, 20-23 August 2002, 598-609.
https://doi.org/10.1016/B978-155860869-6/50059-7
[3]  Fagin, R., Haas, L.M., Hernandez, M., Miller, R.J., Popa, L. and Velegrakis, Y. (2009) Clio: Schema Mapping Creation and Data Exchange. In: Conceptual Modeling: Founda-tions and Applications, Springer, Berlin, 198-236.
https://doi.org/10.1007/978-3-642-02463-4_12
[4]  Roth, M. and Tan, W.-C. (2013) Data Integration and Data Exchange: It’s Really about Time. Sixth Biennial Conference on Innovative Data Systems Research, CIDR 2013, Asi-lomar, 6-9 January 2013.
[5]  Miller, R.J., Haas, L.M. and Hernandez, M.A. (2000) Schema Mapping as Query Dis-covery. VLDB 2000, Proceedings of 26th International Conference on very Large Data Bases, Cairo, 10-14 September 2000, Volume 2000, 77-88.
[6]  Fagin, R., Kolaitis, P.G. and Popa, L. (2005) Data Exchange: Getting to the Core. ACM Transactions on Database Systems (TODS), 30, 174-210.
https://doi.org/10.1145/1061318.1061323
[7]  Kolaitis, P.G. (2005) Schema Mappings, Data Exchange, and Metadata Management. Proceedings of the Twenty-Fourth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Baltimore, 13-15 June 2005, 61-75.
https://doi.org/10.1145/1065167.1065176
[8]  Kang, J. and Naughton, J.F. (2003) On Schema Matching with Opaque Column Names and Data Values. Proceedings of the 2003 ACM SIGMOD International Conference on Man-agement of Data, San Diego, 9-12 June 2003, 205-216.
https://doi.org/10.1145/872757.872783
[9]  Madhavan, J., Bernstein, P.A. and Rahm, E. (2001) Generic Schema Matching with Cupid. VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, Roma, 11-14 September 2001, 49-58.
[10]  Do, H.-H. and Rahm, E. (2002) COMA: A System for Flexible Combination of Schema Matching Approaches. Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, Hong Kong, 20-23 August 2002, 610-621.
https://doi.org/10.1016/B978-155860869-6/50060-3
[11]  Nandi, A. and Bernstein, P.A. (2009) HAMSTER: Using Search Clicklogs for Schema and Taxonomy Matching. PVLDB, 2, 181-192.
https://doi.org/10.14778/1687627.1687649
[12]  Elmeleegy, H., Ouzzani, M. and Elmagarmid, A. (2008) Usage-Based Schema Matching. Proceedings of the 24th International Conference on Data Engineering Workshops, ICDE 2008, Cancún, 7-12 April 2008, 20-29.
https://doi.org/10.1109/ICDE.2008.4497410
[13]  Qian, L., Cafarella, M.J. and Jagadish, H. (2012) Sample-Driven Schema Mapping. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, New York, 73-84.
https://doi.org/10.1145/2213836.2213846
[14]  Yan, L.L., Miller, R.J., Haas, L.M. and Fagin, R. (2001) Da-ta-Driven Understanding and Refinement of Schema Mappings. ACM SIGMOD, Volume 30, 485-496.
https://doi.org/10.1145/376284.375729
[15]  Alexe, B., Chiticariu, L., Miller, R.J. and Tan, W.-C. (2008) Muse: Mapping Understanding and Design by Example. Proceedings of the 24th International Conference on Data Engineering Workshops, ICDE 2008, Cancún, 7-12 April 2008, 10-19.
https://doi.org/10.1109/ICDE.2008.4497409
[16]  Krishnamurthy, R. (2004) Xml-to-sql Query Translation. PhD Thesis, University of Wisconsin, Madison.
https://doi.org/10.1016/B978-012088469-8.50016-4
[17]  Shanmugasundaram, J., Shekita, E., Kiernan, J., Krish-namurthy, R., Viglas, E., Naughton, J. and Tatarinov, I. (2001) A General Technique for Querying Xml Documents Us-ing a Relational Database System. ACM SIGMOD Record, 30, 20-26.
https://doi.org/10.1145/603867.603871
[18]  Dweib, I., Awadi, A., Elrhman, S.E.F. and Lu, J. (2008) Schemaless Approach of Mapping XML Document into Relational Database. 8th IEEE International Conference on Computer and Information Technology, Sydney, 8-11 July 2008, 167-172.
https://doi.org/10.1109/CIT.2008.4594668
[19]  Yoshikawa, M., Amagasa, T., Shimura, T. and Uemura, S. (2001) XRel: A Path-Based Approach to Storage and Retrieval of XML Documents Using Relational Databases. ACM Transac-tions on Internet Technology, 1, 110-141.
https://doi.org/10.1145/383034.383038
[20]  Jiang, H., Lu, H., Wang, W. and Yu, J.X. (2002) XParent: An Effi-cient RDBMS-Based XML Database System. Proceedings of the 18th International Conference on Data Engineering, San Jose, 26 February-1 March 2002, 335-336.
[21]  Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugasundaram, J., Shekita, E. and Zhang, C. (2002) Storing and Querying Ordered XML Using a Relational Database System. The 2002 ACM SIGMOD International Conference on Management of Data, Wisconsin, 4-6 June 2002, 204-215.
https://doi.org/10.1145/564691.564715
[22]  Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J. and Naughton, J.F. (1999) Relational Databases for Querying XML Documents: Limitations and Opportunities. Proceedings of 25th International Conference on Very Large Data Bases, Edinburgh, 7-10 September 1999, 302-314.
[23]  Amer-Yahia, S., Du, F. and Freire, J. (2004) A Comprehensive Solution to the XML-to-Relational Map-ping Problem. In: Proceedings of the 6th Annual ACM International Workshop on Web Information and Data Manage-ment, ACM, New York, 31-38.
https://doi.org/10.1145/1031453.1031461
[24]  Atay, M., Chebotko, A., Liu, D., Lu, S. and Fotouhi, F. (2007) Efficient Schema-Based XML-to-Relational Data Mapping. Information Systems, 32, 458-476.
https://doi.org/10.1016/j.is.2005.12.008
[25]  Soltan, S. and Rahgozar, M. (2006) A Clustering-Based Scheme for Labeling XML Trees. International Journal of Computer Science and Network Security, 6, 84-89.
[26]  Fujimoto, K., Kha, D.D., Yoshikawa, M. and Amagasa, T. (2005) A Mapping Scheme of XML Documents into Relational Databases Using Schema-Based Path Identifiers. International Workshop on Challenges in Web Infor-mation Retrieval and Integration, Tokyo, 8-9 April 2005, 82-90.
[27]  Xing, G., Xia, Z. and Ayers, D. (2007) X2R: A System for Managing XML Documents and Key Constraints Using RDBMS. ACMSE 2007, Winston-Salem, 23-24 March 2007, 215-220.
https://doi.org/10.1145/1233341.1233380
[28]  Imdb to json. https://github.com/oxplot/imdb2json/blob/master/Readme.md
[29]  Hopcroft, J.E., Motwani, R. and Ullman, J.D. (2006) Introduction to Automata Theory, Languages, and Computation. 3rd Edition, Addison-Wesley Longman Publishing Co., Inc., Boston.
[30]  McCluskey, E.J. (1956) Minimization of Boolean Functions. Bell Labs Technical Journal, 35, 1417-1444.
https://doi.org/10.1002/j.1538-7305.1956.tb03835.x
[31]  Quine, W.V. (1952) The Problem of Simplifying Truth Functions. The American Mathematical Monthly, 59, 521-531.
https://doi.org/10.1080/00029890.1952.11988183
[32]  Madhavan, J., Bernstein, P.A., and Rahm, E. (2001) Ge-neric Schema Matching with Cupid. VLDB, 49-58.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133