OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Journal of Computer and Communications 2025

A Maritime Document Knowledge Graph Construction Method Based on Conceptual Proximity Relations

DOI: 10.4236/jcc.2025.132005, PP. 51-67

Yiwen Lin, Tao Yang, Yuqi Shao, Meng Yuan, Pinghua Hu, Chen Li

Keywords: Knowledge Graph, Large Language Model, Concept Extraction, Cost-Effective Graph Construction

Full-Text Cite this paper Add to My Lib

Abstract:

The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document with a focus on simplicity and cost-effectiveness. The process involves splitting the document into chunks, extracting concepts within each chunk using a large language model (LLM), and building relationships based on the proximity of concepts in the same chunk. Unlike traditional named entity recognition (NER), which identifies entities like “Shanghai”, the proposed method identifies concepts, such as “Convenient transportation in Shanghai” which is found to be more meaningful for KG construction. Each edge in the KG represents a relationship between concepts occurring in the same text chunk. The process is computationally inexpensive, leveraging locally set up tools like Mistral 7B openorca instruct and Ollama for model inference, ensuring the entire graph generation process is cost-free. A method of assigning weights to relationships, grouping similar pairs, and summarizing multiple relationships into a single edge with associated weight and relation details is introduced. Additionally, node degrees and communities are calculated for node sizing and coloring. This approach offers a scalable, cost-effective solution for generating meaningful knowledge graphs from large documents, achieving results comparable to GraphRAG while maintaining accessibility for personal machines.

References

[1]	Carmo, D., Piau, M., Campiotti, I., Nogueira, R. and Lotufo, R. (2020) PTT5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data. arXiv: 2008.09144. https://doi.org/10.48550/arXiv.2008.09144
[2]	Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. and Sutskever, I. (2019) Language Models Are Unsupervised Multitask Learners. OpenAI.
[3]	Brown, T.B. (2020) Language Models Are Few-Shot Learners. arXiv: 2005.14165. https://doi.org/10.48550/arXiv.2005.14165
[4]	Du, N., Huang, Y., Dai, A.M., Tong, S., Lepikhin, D., Xu, Y., et al. (2022) Glam: Efficient Scaling of Language Models with Mixture-of-Experts. arXiv: 2112.06905. https://doi.org/10.48550/arXiv.2112.06905
[5]	Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H.-T., et al. (2022) LaMDA: Language Models for Dialog Applications. arXiv: 2201.08239. https://doi.org/10.48550/arXiv.2201.08239
[6]	Zeng, W., Ren, X., Su, T., Wang, H., Liao, Y., Wang, Z., et al. (2021) Pangu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-Parallel Computation. arXiv: 2104.12369. https://doi.org/10.48550/arXiv.2104.12369
[7]	Hepp, A., Loosen, W., Dreyer, S., Jarke, J., Kannengießer, S., Katzenbach, C., et al. (2023) ChatGPT, Lamda, and the Hype around Communicative AI: The Automation of Communication as a Field of Research in Media and Communication Studies. Human-Machine Communication, 6, 41-63. https://doi.org/10.30658/hmc.6.4
[8]	Liu, Y. Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., et al. (2019) RoBERTa: A Robustly Optimized Bert Pretraining Approach. arXiv: 1907.11692. https://doi.org/10.48550/arXiv.1907.11692
[9]	Rae, J.W., Borgeaud, S., Cai, T., Millican, K., Hoffmann, J., Song, F., et al. (2021) Scaling Language Models: Methods, Analysis & Insights from Training Gopher. arXiv: 2112.11446. https://doi.org/10.48550/arXiv.2112.11446
[10]	Smith, S., Patwary, M., Norick, B., LeGresley, P., Rajbhandari, S., Casper, J., et al. (2022) Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, a Large-Scale Generative Language Model. arXiv: 2201.11990. https://doi.org/10.48550/arXiv.2201.11990
[11]	Sun, Y., Wang, S., Feng, S., Ding, S., Pang, C., Shang, J., et al. (2021) ERNIE 3.0: Large-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation. arXiv: 2107.02137. https://doi.org/10.48550/arXiv.2107.02137
[12]	Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., et al. (2023) Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities. arXiv: 2308.12966. https://doi.org/10.48550/arXiv.2308.12966
[13]	Ethayarajh, K. (2019) How Contextual Are Contextualized Word Representations? Comparing the Geometry of BERT, Elmo, and GPT-2 Embeddings. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 3-7 November 2019, 55-65. https://doi.org/10.18653/v1/d19-1006
[14]	Lester, B., Al-Rfou, R. and Constant, N. (2021) The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online, 7-11 November 2021, 3045-3059. https://doi.org/10.18653/v1/2021.emnlp-main.243
[15]	Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., et al. (2022) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, 28 November-9 December 2022, 24824-24837.
[16]	Li, X.L. and Liang, P. (2021) Prefix-Tuning: Optimizing Continuous Prompts for Generation. arXiv: 2101.00190. https://doi.org/10.48550/arXiv.2101.00190
[17]	Qiu, Z., Wu, X., Gao, J. and Fan, W. (2021) U-BERT: Pre-Training User Representations for Improved Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 4320-4327. https://doi.org/10.1609/aaai.v35i5.16557
[18]	Wu, C., Wu, F., Qi, T. and Huang, Y. (2021) Empowering News Recommendation with Pre-Trained Language Models. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11-15 July 2021, 1652-1656. https://doi.org/10.1145/3404835.3463069
[19]	Nayak, R. (2023) How to Convert Any Text into a Graph of Concepts. https://towardsdatascience.com/how-to-convert-any-text-into-a-graph-of-concepts-110844f22a1a/

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133