全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

图检索增强生成研究综述
Review of Graph Retrieval-Augmented Generation Research

DOI: 10.12677/airr.2025.142040, PP. 402-413

Keywords: GraphRAG,大语言模型,检索增强生成
GraphRAG
, Large Language Model, Retrieval-Augmented Generation

Full-Text   Cite this paper   Add to My Lib

Abstract:

近年来,通过整合外部知识库来提高大语言模型(LLM)的性能,检索增强生成(RAG)取得了显著的成功。通过引用外部知识库,RAG可以完善LLM输出,从而有效解决幻觉、缺乏领域特定知识和过时信息等问题。然而,数据库中不同实体之间复杂的关系结构带来了挑战。对此,GraphRAG利用实体之间的结构化信息来实现更精确和全面的检索,捕捉关系知识并促进与上下文相关的更准确的生成。本文概述了GraphRAG相关技术和技术原理,研究了GraphRAG的下游任务、应用领域和评估标准,最后探讨了GraphRAG的未来研究方向,对未来的技术发展趋势进行了展望。
In recent years, Retrieval-Augmented Generation (RAG) has achieved remarkable success in enhancing the performance of large language models (LLMs) by integrating external knowledge bases. By referencing external knowledge bases, RAG can refine the outputs of LLMs, effectively addressing issues such as hallucinations, lack of domain-specific knowledge, and outdated information. However, the complex relational structures among different entities in the databases pose challenges. In response, GraphRAG utilizes the structured information between entities to achieve more precise and comprehensive retrieval, capturing relational knowledge and facilitating more accurate context-related generation. This paper outlines the related technologies and technical principles of GraphRAG, examines its downstream tasks, application domains, and evaluation criteria, and finally explores future research directions for GraphRAG, offering insights into the future trends of technological development.

References

[1]  OpenAI (2024) GPT-4 Technical Report.
https://arxiv.org/abs/2303.08774
[2]  Yang, A., Yang, B.S., et al. (2024) Qwen2 Technical Report.
https://arxiv.org/abs/2407.10671
[3]  Dubey, A., Jauhri, A., et al. (2024) The Llama 3 Herd of Models.
https://arxiv.org/abs/2407.21783
[4]  Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, 4-9 December 2017, 5998-6008.
[5]  Huang, L., Yu, W.J., Ma, W.T., Zhong, W.H., Feng, Z.Y., Wang, H.T., et al. (2023) A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions.
https://arxiv.org/abs/2311.05232
[6]  Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., et al. (2024) A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, 25-29 August 2024, 6491-6501.
https://doi.org/10.1145/3637528.3671470
[7]  Gao, Y.F., Xiong, Y., Gao, X.Y., Jia, K.X., Pan, J.L., Bi, Y.X., et al. (2024) Retrieval-Augmented Generation for Large Language Models: A Survey.
https://arxiv.org/abs/2312.10997
[8]  Hu, Y.C. and Lu, Y.X. (2024) RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing.
https://arxiv.org/abs/2404.19543
[9]  Huang, Y.Z. and Huang, J. (2024) A Survey on Retrieval-Augmented Text Generation for Large Language Models.
https://arxiv.org/abs/2404.10981
[10]  Wu, S.Y., Xiong, Y., Cui, Y.F., Wu, H.L., Chen, C., Yuan, Y., et al. (2024) Retrieval-Augmented Generation for Natural Language Processing: A Survey.
https://arxiv.org/abs/2407.13193
[11]  Yu, H., Gan, A.R., Zhang, K., Tong, S.W., Liu, Q. and Liu, Z.F. (2024) Evaluation of Retrieval-Augmented Generation: A Survey.
https://arxiv.org/abs/2405.07437
[12]  Zhao, P.H., Zhang, H.L., Yu, Q.H., Wang, Z.R., Geng, Y.T., Fu, F.C., et al. (2024) Retrieval-Augmented Generation for AI-Generated Content: A Survey.
https://arxiv.org/abs/2402.19473
[13]  Liu, N.F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., et al. (2024) Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12, 157-173.
https://doi.org/10.1162/tacl_a_00638
[14]  Edge, D., Trinh, H., Cheng, N., et al. (2024) From Local to Global: A Graph RAG Approach to Query-Focused Summarization.
https://arxiv.org/abs/2404.16130
[15]  Hu, Y.T., Lei, Z.H., Zhang, Z., Pan, B., Ling, C. and Zhao, L. (2024) GRAG: Graph Retrieval-Augmented Generation.
https://arxiv.org/abs/2405.16506
[16]  Mavromatis, C. and Karypis, G. (2024) GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning.
https://arxiv.org/abs/2405.20139
[17]  Guo, J.Y., Du, L., Liu, H.Y., Zhou, M.Y., He, X.Y. and Han, S. (2023) GPT4Graph: Can Large Language Models Understand Graph Structured Data? An Empirical Evaluation and Benchmarking.
https://arxiv.org/abs/2305.15066
[18]  Wang, H., Feng, S.B., He, T.X., Tan, Z.X., Han, X.C. and Tsvetkov, Y. (2023) Can Language Models Solve Graph Problems in Natural Language? Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, 10-16 December 2023, 30840-30861.
[19]  Chen, H.J. (2024) Large Knowledge Model: Perspectives and Challenges.
https://arxiv.org/abs/2312.02706
[20]  Fan, W.Q., Wang, S.J., Huang, J.N., Chen, Z.K., Song, Y., Tang, W.Z., et al. (2024) Graph Machine Learning in the Era of Large Language Models (LLMs).
https://arxiv.org/abs/2404.14928
[21]  Jin, B.W., Liu, G., Han, C., Jiang, M., Ji, H. and Han, J.W. (2024) Large Language Models on Graphs: A Comprehensive Survey.
https://arxiv.org/abs/2312.02783
[22]  Li, Y.H., Li, Z.X., Wang, P.S., Li, J., Sun, X.G., Cheng, H. and Yu, J.X. (2024) A Survey of Graph Meets Large Language Model: Progress and Future Directions.
https://arxiv.org/abs/2311.12399
[23]  Liu, J.W., Yang, C., Lu, Z.Y., Chen, J.Z., Li, Y.B., Zhang, M.M., et al. (2024) Towards Graph Foundation Models: A Survey and Beyond.
https://arxiv.org/abs/2310.11829
[24]  Fu, B., Qiu, Y.Q., Tang, C.G., Li, Y., Yu, H.Y. and Sun, J. (2020) A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges.
https://arxiv.org/abs/2007.13069
[25]  Lan, Y., He, G., Jiang, J., Jiang, J., Zhao, W.X. and Wen, J. (2021) A Survey on Complex Knowledge Base Question Answering: Methods, Challenges and Solutions. Proceedings of the 30th International Joint Conference on Artificial Intelligence, Montreal, 19-27 August 2021, 4483-4491.
https://doi.org/10.24963/ijcai.2021/611
[26]  Lan, Y., He, G., Jiang, J., Jiang, J., Zhao, W.X. and Wen, J. (2023) Complex Knowledge Base Question Answering: A Survey. IEEE Transactions on Knowledge and Data Engineering, 35, 11196-11215.
https://doi.org/10.1109/tkde.2022.3223858
[27]  Kipf, T.N. and Welling, M. (2017) Semi-Supervised Classification with Graph Convolutional Networks. 5th International Conference on Learning Representations, ICLR 2017, Toulon, 24-26 April 2017.
https://openreview.net/forum?id=SJU4ayYgl
[28]  Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P. and Bengio, Y. (2018) Graph Attention Networks.
https://arxiv.org/abs/1710.10903
[29]  Hamilton, W.L., Ying, Z.T. and Leskovec, J. (2017) Inductive Representation Learning on Large Graphs. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, 4-9 December 2017, 1024-1034.
[30]  Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 4171-4186.
[31]  Liu, Y.H., Ott, M., Goyal, N., et al. (2019) RoBERTa: A Robustly Optimized BERT Pretraining Approach.
https://arxiv.org/abs/1907.11692
[32]  Reimers, N. and Gurevych, I. (2019) Sentence-Bert: Sentence Embeddings Using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 3-7 November 2019, 3980-3990.
https://doi.org/10.18653/v1/d19-1410
[33]  Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., et al. (2020) Language Models Are Few-Shot Learners. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 6-12 December 2020, 1877-1901.
[34]  Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., et al. (2022) Training Language Models to Follow Instructions with Human Feedback. 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, 28 November-9 December 2022, 27730-27744.
[35]  Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R. and Ives, Z. (2007) DBpedia: A Nucleus for a Web of Open Data. 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, 11-15 November 2007, 722-735.
https://doi.org/10.1007/978-3-540-76298-0_52
[36]  Bollacker, K., Evans, C., Paritosh, P., Sturge, T. and Taylor, J. (2008) Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, 9-12 June 2008, 1247-1250.
https://doi.org/10.1145/1376616.1376746
[37]  Liu, H. and Singh, P. (2004) ConceptNet—A Practical Commonsense Reasoning Tool-Kit. BT Technology Journal, 22, 211-226.
https://doi.org/10.1023/b:bttj.0000047600.45421.6d
[38]  Sap, M., Le Bras, R., Allaway, E., Bhagavatula, C., Lourie, N., Rashkin, H., et al. (2019) ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 3027-3035.
https://doi.org/10.1609/aaai.v33i01.33013027
[39]  Suchanek, F.M., Kasneci, G. and Weikum, G. (2007) Yago: A Core of Semantic Knowledge. Proceedings of the 16th International Conference on World Wide Web, Banff, 8-12 May 2007, 697-706.
https://doi.org/10.1145/1242572.1242667
[40]  Vrandečić, D. and Krötzsch, M. (2014) Wikidata: A Free Collaborative Knowledgebase. Communications of the ACM, 57, 78-85.
https://doi.org/10.1145/2629489
[41]  Morris, C., Kriege, N.M., Bause, F., Kersting, K., Mutzel, P. and Neumann, M. (2020) TU Dataset: A Collection of Benchmark Datasets for Learning with Graphs. ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020), Seattle, July 2022, 1455-1468.
[42]  Gutiérrez, B.J., Shu, Y.H., Gu, Y., Yasunaga, M. and Su, Y. (2024) HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models.
https://arxiv.org/abs/2405.14831
[43]  Li, D.W., Yang, S., Tan, Z., et al. (2024) DALK: Dynamic Co-Augmentation of LLMs and KG to Answer Alzheimer’s Disease Questions with Scientific Literature.
https://arxiv.org/abs/2405.04819
[44]  Wang, Y., Lipka, N., Rossi, R.A., Siu, A., Zhang, R. and Derr, T. (2024) Knowledge Graph Prompting for Multi-Document Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 19206-19214.
https://doi.org/10.1609/aaai.v38i17.29889
[45]  Xu, Z., Cruz, M.J., Guevara, M., Wang, T., Deshpande, M., Wang, X., et al. (2024) Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington DC, 14-18 July 2024, 2905-2909.
https://doi.org/10.1145/3626772.3661370
[46]  Wang, R., Li, Z., Zhang, D., Yin, Q., Zhao, T., Yin, B., et al. (2022) RETE: Retrieval-Enhanced Temporal Event Forecasting on Unified Query Product Evolutionary Graph. Proceedings of the ACM Web Conference 2022, Lyon, 25-29 April 2022, 462-472.
https://doi.org/10.1145/3485447.3511974
[47]  Jiang, X.K., Zhang, R.Z., Xu, Y.X., Qiu, R.H., Fang, Y., Wang, Z.Y., et al. (2024) HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses.
https://arxiv.org/abs/2312.15883
[48]  Wen, Y., Wang, Z. and Sun, J. (2024) Mindmap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Volume 1, 10370-10388.
https://doi.org/10.18653/v1/2024.acl-long.558
[49]  Yang, R., Liu, H., Marrese-Taylor, E., Zeng, Q., Ke, Y., Li, W., et al. (2024) KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques. Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, Bangkok, August 2024, 155-166.
https://doi.org/10.18653/v1/2024.bionlp-1.13
[50]  Delile, J., Mukherjee, S., Van Pamel, A. and Zhukov, L. (2024) Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge.
https://arxiv.org/abs/2402.12352
[51]  Ranade, P. and Joshi, A. (2023) FABULA: Intelligence Report Generation Using Retrieval-Augmented Narrative Construction. Proceedings of the International Conference on Advances in Social Networks Analysis and Mining, Kusadasi, 6-9 November 2023, 603-610.
https://doi.org/10.1145/3625007.3627505
[52]  Peng, Z. and Yang, Y. (2024) Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs. Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, June 2024, 1877-1890.
https://doi.org/10.18653/v1/2024.findings-naacl.121
[53]  Wu, T., Bai, X., Guo, W., Liu, W., Li, S. and Yang, Y. (2023) Modeling Fine-Grained Information via Knowledge-Aware Hierarchical Graph for Zero-Shot Entity Retrieval. Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Singapore, 27 February-3 March 2023, 1021-1029.
https://doi.org/10.1145/3539597.3570415
[54]  Li, Y.H., Zhang, R. and Liu, J.Y. (2024) An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration.
https://arxiv.org/abs/2402.04978
[55]  Sun, J.S., Xu, C.J., Tang, L., et al. (2024) Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph.
https://arxiv.org/abs/2307.07697
[56]  Sun, L., Tao, Z., Li, Y. and Arakawa, H. (2024) ODA: Observation-Driven Agent for Integrating LLMs and Knowledge Graphs. Findings of the Association for Computational Linguistics ACL 2024, Bangkok, August 2024, 7417-7431.
https://doi.org/10.18653/v1/2024.findings-acl.442
[57]  Qi, Z.X., Yu, Y.J., Tu, M.Q., et al. (2023) FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-Training and Knowledge Graph Prompt.
https://arxiv.org/abs/2308.10173
[58]  Choudhary, N. and Reddy, C.K. (2024) Complex Logical Reasoning over Knowledge Graphs Using Large Language Models.
https://arxiv.org/abs/2305.01157
[59]  Pahuja, V., Wang, B., Latapie, H., Srinivasa, J. and Su, Y. (2023) A Retrieve-and-Read Framework for Knowledge Graph Link Prediction. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, 21-25 October 2023, 1992-2002.
https://doi.org/10.1145/3583780.3614769
[60]  Baek, J., Aji, A.F., Lehmann, J. and Hwang, S.J. (2023) Direct Fact Retrieval from Knowledge Graphs without Entity Linking. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Volume 1, 10038-10055.
https://doi.org/10.18653/v1/2023.acl-long.558
[61]  He, X.X., Tian, Y.J., Sun, Y.F., et al. (2024) G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering.
https://arxiv.org/abs/2402.07630
[62]  Jin, B., Xie, C., Zhang, J., Roy, K.K., Zhang, Y., Li, Z., et al. (2024) Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs. Findings of the Association for Computational Linguistics ACL 2024, Bangkok, August 2024, 163-184.
https://doi.org/10.18653/v1/2024.findings-acl.11
[63]  Yang, X., Sun, K., Xin, H., Sun, Y.S., Bhalla, N., Chen, X.S., et al. (2024) CRAG-Comprehensive RAG Benchmark.
https://arxiv.org/abs/2406.04744

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133