大语言模型融合知识图谱与向量检索的问答系统  被引量:1

Question Answering System Based on Large Language Model Integrating Knowledge Graph and Vector Retrieval

在线阅读下载全文

作  者:王帅[1] 何文春[1] 王甫棣[1] 赵希鹏[1] 周远洋 WANG Shuai;HE Wen-chun;WANG Fu-di;ZHAO Xi-peng;ZHOU Yuan-yang(National Meteorological Information Centre,Beijng 100081,China)

机构地区:[1]国家气象信息中心,北京100081

出  处:《科学技术与工程》2024年第32期13902-13910,共9页Science Technology and Engineering

基  金:中国气象局人工智能气象应用(气发〔2023〕78号);国家气象信息中心人工智能基础能力建设工作组(气信发〔2023〕110号);气象决策管理协同支撑建设项目(气函〔2023〕8号);国家气象信息中心“气象政务服务与数字技术融合”创新团队(NMIC-2024-ZD17)。

摘  要:随着深度学习的发展,大型神经网络在自然语言处理领域得到广泛应用。然而,基于大模型的问答系统存在幻觉、失效过期等问题,且难以捕捉实体之间的复杂关系,导致结果偏差。鉴于此,提出一种利用大语言模型微调构建知识图谱和向量检索的融合问答系统。系统通过微调大模型实现知识图谱构建与应用、多模型混合调用;结合知识图谱搜索和向量搜索实现检索结果优化。系统包括图查询模型微调、知识图谱抽取模型微调、知识图谱与向量数据库构建、融合检索与排序4个模块。图查询模型和知识图谱抽取模型分别用于生成图查询语句和抽取三元组知识;知识图谱存储在Neo4j中,文本向量存储在向量数据库中(postgres vector,PGVector)中;融合检索综合利用知识图谱和向量搜索结果。结果表明:在标准问答数据集(the stanford question answering dataset,SQuAD)上,融合检索方法的F1值为0.77,优于单一的知识图谱检索(0.73)和向量检索(0.74)方法。专家评估也表明,融合方法的结果最佳。该融合问答系统能充分发挥大模型、知识图谱和向量检索的优势,提高了问答的准确性和全面性。未来可在知识图谱更新、模型偏见减少和系统优化等方面开展进一步研究。With the development of deep learning,large neural networks have been widely applied in the field of natural language processing.However,question answering systems based on large models are suffered from problems such as hallucination,failure,and expiration,and are unable to effectively capture complex relationships between entities,leading to biased results.In view of this,a question answering system that integrated knowledge graphs and vector retrieval with fine-tuned large language models was proposed.The system leveraged fine-tuned large models to achieve the construction and application of knowledge graphs,enabling multi-model hybrid calls;it also combined knowledge graph search and vector search to optimize retrieval results.The system comprised four modules:graph query model fine-tuning,knowledge graph extraction model fine-tuning,knowledge graph and vector database construction,and fusion retrieval and ranking.The graph query model and knowledge graph extraction model were used to generate graph query statements and extract triple knowledge,respectively.The knowledge graph was stored in Neo4j,while text vectors were stored in postgres vector(PGVector).The fusion retrieval comprehensively utilized knowledge graph and vector search results.The results show that on the stanford question answering dataset(SQuAD),the fusion retrieval method is achieved an F1 score of 0.77,outperforming the single knowledge graph retrieval(0.73)and vector retrieval(0.74)methods.Expert evaluation also indicates that the fusion method yields the best results.The proposed integrated question answering system can fully leverage the advantages of large models,knowledge graphs,and vector retrieval,improving the accuracy and comprehensiveness of question answering.Future research can be conducted in areas such as knowledge graph updates,model bias reduction,and system optimization.

关 键 词:知识图谱 大语言模型 问答系统 微调 向量检索 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象