面向军事领域知识问答系统的多策略检索增强生成方法  

Multi-strategy retrieval-augmented generation method for military domain knowledge question answering systems

在线阅读下载全文

作  者:张艳萍 陈梅芳 田昌海 易子博 胡文鹏 罗威 罗准辰 ZHANG Yanping;CHEN Meifang;TIAN Changhai;YI Zibo;HU Wenpeng;LUO Wei;LUO Zhunchen(School of Mathematics and Physics,Hebei University of Engineering,Handan Hebei 056038,China;School of Information and Electrical Engineering,Hebei University of Engineering,Handan Hebei 056038,China;Information Research Center of Military Science,PLA Academy of Military Science,Beijing 100142,China)

机构地区:[1]河北工程大学数理科学与工程学院,河北邯郸056038 [2]河北工程大学信息与电气工程学院,河北邯郸056038 [3]中国人民解放军军事科学院军事科学信息研究中心,北京100142

出  处:《计算机应用》2025年第3期746-754,共9页journal of Computer Applications

基  金:国家自然科学基金青年项目(62206308)。

摘  要:基于检索增强生成(RAG)的军事领域知识问答系统已经逐渐成为现代情报人员收集和分析情报的重要工具。针对目前RAG方法的应用策略中的混合检索存在可移植性不强以及非必要使用查询改写容易诱发语义漂移的问题,提出一种多策略检索增强生成(MSRAG)方法。首先,根据用户输入的查询特点自适应地匹配检索模型来召回相关文本;其次,利用文本过滤器提取出能够回答问题的关键文本片段;再次,使用文本过滤器进行内容有效性判断以启动基于同义词拓展的查询改写,并将初始查询与改写后的信息合并输入检索控制器以进行更有针对性的再次检索;最后,合并能够回答问题的关键文本片段和问题,并使用提示工程输入生成答案模型来生成响应返回给用户。实验结果表明,MSRAG方法在军事领域数据集(Military)和Medical数据集的ROUGE-L(Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence)指标上相较于凸线性组合RAG方法分别提高了14.35和5.83个百分点。可见,MSRAG方法具备较强的通用性和可移植性,能够缓解非必要查询改写导致的语义漂移现象,有效帮助大模型生成更准确的答案。The military domain knowledge question answering system based on Retrieval-Augmented Generation(RAG)has become an important tool for modern intelligence personnel to collect and analyze intelligence gradually.Focusing on the issue that the application strategies of RAG methods currently suffer from poor portability in hybrid retrieval as well as the problem of semantic drift caused by unnecessary query rewriting easily,a Multi-Strategy Retrieval-Augmented Generation(MSRAG)method was proposed.Firstly,the retrieval model was matched adaptively to recall relevant text based on query characteristics of the user input.Secondly,a text filter was utilized to extract the key text fragments that can answer the question.Thirdly,the content validity was assessed by the text filter to trigger query rewriting based on synonym expansion,and the initial query was merged with the rewritten information and used as input of the retrieval controller for more targeted re-retrieval.Finally,the key text fragments that can answer the question were merged with the question,prompt engineering input was used to generate answer model,and the response generated by the model was returned to the user.Experimental results show that compared to the convex linear combination RAG method,MSRAG method improves the ROUGE-L(Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence)by 14.35 percentage points on the Military domain dataset(Military)and by 5.83 percentage points on the Medical dataset.It can be seen that MSRAG method has strong universality and portability,enables the reduction of the semantic drift caused by unnecessary query rewriting,and effectively helps large language models generate more accurate answers.

关 键 词:检索增强生成 军事知识问答 信息检索 文本过滤 查询改写 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象