全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Intelligent ETL for Enterprise Software Applications Using Unstructured Data

DOI: 10.4236/jsea.2025.181003, PP. 44-65

Keywords: Structured Data, Relational Model, LLM-Powered Agents, Field-Level Extraction, Knowledge Graph

Full-Text   Cite this paper   Add to My Lib

Abstract:

Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.

References

[1]  https://planergy.com/blog/manual-procurement-process/
[2]  Bahameish, B., Yaqot, M., Franzoi, R. and Menezes, B. (2022) Artificial Intelligence in Procurement: An Overview and Case Study of Qatar Foundation. Proceedings of the International Conference on Industrial Engineering and Operations Management, Rome, 26-28 July 2022, 722-732.
https://doi.org/10.46254/eu05.20220146
[3]  Yang, J., Hu, X., Xiao, G. and Shen, Y. (2024) A Survey of Knowledge Enhanced Pre-Trained Language Models. ACM Transactions on Asian and Low-Resource Language Information Processing.
https://doi.org/10.1145/3631392
[4]  Kalyanpur, A., Saravanakumar, K.K., Barres, V., McFate, C.J., Moon, L., Seifu, N., Eremeev, M., Barrera, J., Bautista-Castillo, A., Brown, E. and Ferrucci, D. (2024) Multi-Step Knowledge Retrieval and Inference over Unstructured Data. arXiv: 2406.17987.
[5]  Zhou, M.Y. (2024) Improving LLM Understanding of Structured Data and Exploring Advanced Prompting Methods. Microsoft Research Blog.
[6]  Biswas, A. and Talukdar, W. (2024) Robustness of Structured Data Extraction from In-Plane Rotated Documents using Multi-Modal Large Language Models (LLM). Journal of Artificial Intelligence Research, 4, 176-195.
[7]  Fang, X., Xu, W.J., Tan, F.A., Zhang, J.N., Hu, Z.Q., Qi, Y.J., Nickleach, S., Socolinsky, D., Sengamedu, S. and Faloutsos, C. (2024) Large Language Models (LLMs) on Tabular Data: Prediction, Generation, and Under-Standing—A Survey.
https://doi.org/10.48550/arXiv.2402.17944
[8]  Narayanan, P.P. and Narayana Iyer, A.P. (2024) HySem: A Context Length Optimized LLM Pipeline for Unstructured Tabular Extraction. arXiv: 2408.09434.
[9]  Li, H., Gao, H., Wu, C. and Vasarhelyi, M.A. (2023) Extracting Financial Data from Unstructured Sources: Leveraging Large Language Models. SSRN Electronic Journal.
https://doi.org/10.2139/ssrn.4567607
[10]  Dagdelen, J., Dunn, A., Lee, S., Walker, N., Rosen, A.S., Ceder, G., et al. (2024) Structured Information Extraction from Scientific Text with Large Language Models. Nature Communications, 15, Article No. 1418.
https://doi.org/10.1038/s41467-024-45563-x
[11]  Yang, Y., Wu, Z., Yang, Y., Lian, S., Guo, F. and Wang, Z. (2022) A Survey of Information Extraction Based on Deep Learning. Applied Sciences, 12, Article 9691.
https://doi.org/10.3390/app12199691
[12]  Shan, Y., Lu, H. and Lou, W. (2023) A Hybrid Attention and Dilated Convolution Framework for Entity and Relation Extraction and Mining. Scientific Reports, 13, Article No. 17062.
https://doi.org/10.1038/s41598-023-40474-1
[13]  Yang, Y., Tang, Y.X. and Tam, K.Y. (2023) InvestLM: A Large Language Model for Investment Using Financial Domain Instruction Tuning. arXiv: 2309.13064.
[14]  Krugmann, J.O. and Hartmann, J. (2024) Sentiment Analysis in the Age of Generative AI. Customer Needs and Solutions, 11, Article No. 3.
https://doi.org/10.1007/s40547-024-00143-4
[15]  Parthasarathy, V.B., Zafar, A., Khan, A. and Shahid, A. (2024) The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities. arXiv: 2408.13296.
[16]  Trad, F. and Chehab, A. (2024) Prompt Engineering or Fine-Tuning? A Case Study on Phishing Detection with Large Language Models. Machine Learning and Knowledge Extraction, 6, 367-384.
https://doi.org/10.3390/make6010018
[17]  Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J. and Wu, X. (2024) Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Transactions on Knowledge and Data Engineering, 36, 3580-3599.
https://doi.org/10.1109/tkde.2024.3352100
[18]  Hello, N., Di Lorenzo, P. and Strinati, E.C. (2024) Semantic Communication Enhanced by Knowledge Graph Representation Learning. 2024 IEEE 25th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Lucca, 10-13 September 2024, 876-880.
https://doi.org/10.1109/spawc60668.2024.10694291
[19]  Zhao, H., Jiang, W., Deng, J., Ren, Q. and Zhang, L. (2023) Constructing Knowledge Graph for Electricity Keywords Based on Large Language Model. 2023 IEEE 7th Conference on Energy Internet and Energy System Integration (EI2), Hangzhou, 15-18 December 2023, 4844-4849.
https://doi.org/10.1109/ei259745.2023.10512525

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133