基于关键词抽取和混合检索的领域问答对齐优化方法

Domain Question Answering Alignment Optimization Method Based on Keyword Extraction and Hybrid Retrieval

作　　者：杨冬菊[1,2] 杨坤 YANG Dongju;YANG Kun(School of Information Science and Technology,North China University of Technology,Beijing 100144,China;Beijing Key Laboratory on Integration and Analysis of Large-Scale Stream,Beijing 100144,China)

机构地区：[1]北方工业大学信息学院,北京100144 [2]大规模流数据集成与分析技术北京市重点实验室,北京100144

出　　处：《北方工业大学学报》2025年第1期52-62,共11页Journal of North China University of Technology

基　　金：国家自然科学基金国际(地区)合作与交流项目(62061136006)。

摘　　要：针对目前基于检索增强生成技术的领域问答任务中由于用户查询和知识库中相关知识的语义差距导致回答效果差的问题,本文提出一种基于关键词抽取和混合检索的对齐优化方法。首先,利用大语言模型抽取用户查询中的关键词;其次,将用户查询拼接抽取后的关键词组成组合查询,将组合查询与用户查询分别输入稀疏检索模型和稠密检索模型召回相关文档;然后,将检索模型召回的文档做并集处理并重排;最后,将重排后的相关知识输入文本过滤器提取出关键信息文本,并与用户查询合并输入大语言模型生成答案返回给用户。实验结果表明,所提方法在公开的中医药问答数据集和通用领域问答数据集CMRC2018上相较于基于查询改写的对齐优化方法,Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence(ROUGE-L)指标分别提高了9.9个百分点和2.3个百分点,F1指标分别提高了4.1个百分点和1.7个百分点。本文的实验结果验证了所提方法在提升领域问答准确度的有效性。Aiming at the problem of poor answering effect due to the semantic gap between user query and relevant knowledge in knowledge base in the current domain question answering task based on retrieval enhancement generation technology,an alignment optimization method based on keyword extraction and hybrid retrieval is proposed.Firstly,the keywords in user query are extracted by using a large language model;secondly,the keywords extracted from user query are concatenated to form a combined query,and the combined query and user query are respectively input into the sparse retrieval model and the dense retrieval model to recall relevant documents;then,the documents recalled by the retrieval model are processed and re-ranked;finally,the re-ranked relevant knowledge is input into the text filter to extract the key information text,and then combined with the user query into the large language model to generate the answer and return it to the user.Experimental results show that compared with the alignment optimization method based on query rewriting,the proposed method improves the Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence(ROUGE-L)index by 2.3 percentage points and 9.9 percentage points respectively,and the F1 index by 1.7 percentage points and 4.1 percentage points respectively on the public traditional Chinese medicine question answering dataset and the general domain question answering dataset CMRC2018.Experimental results verify the effectiveness of the proposed method in improving the accuracy of domain question answering.

关键词：检索增强生成关键词抽取领域问答混合检索

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于关键词抽取和混合检索的领域问答对齐优化方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于关键词抽取和混合检索的领域问答对齐优化方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索