检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨冬菊[1,2] 杨坤 YANG Dongju;YANG Kun(School of Information Science and Technology,North China University of Technology,Beijing 100144,China;Beijing Key Laboratory on Integration and Analysis of Large-Scale Stream,Beijing 100144,China)
机构地区:[1]北方工业大学信息学院,北京100144 [2]大规模流数据集成与分析技术北京市重点实验室,北京100144
出 处:《北方工业大学学报》2025年第1期52-62,共11页Journal of North China University of Technology
基 金:国家自然科学基金国际(地区)合作与交流项目(62061136006)。
摘 要:针对目前基于检索增强生成技术的领域问答任务中由于用户查询和知识库中相关知识的语义差距导致回答效果差的问题,本文提出一种基于关键词抽取和混合检索的对齐优化方法。首先,利用大语言模型抽取用户查询中的关键词;其次,将用户查询拼接抽取后的关键词组成组合查询,将组合查询与用户查询分别输入稀疏检索模型和稠密检索模型召回相关文档;然后,将检索模型召回的文档做并集处理并重排;最后,将重排后的相关知识输入文本过滤器提取出关键信息文本,并与用户查询合并输入大语言模型生成答案返回给用户。实验结果表明,所提方法在公开的中医药问答数据集和通用领域问答数据集CMRC2018上相较于基于查询改写的对齐优化方法,Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence(ROUGE-L)指标分别提高了9.9个百分点和2.3个百分点,F1指标分别提高了4.1个百分点和1.7个百分点。本文的实验结果验证了所提方法在提升领域问答准确度的有效性。Aiming at the problem of poor answering effect due to the semantic gap between user query and relevant knowledge in knowledge base in the current domain question answering task based on retrieval enhancement generation technology,an alignment optimization method based on keyword extraction and hybrid retrieval is proposed.Firstly,the keywords in user query are extracted by using a large language model;secondly,the keywords extracted from user query are concatenated to form a combined query,and the combined query and user query are respectively input into the sparse retrieval model and the dense retrieval model to recall relevant documents;then,the documents recalled by the retrieval model are processed and re-ranked;finally,the re-ranked relevant knowledge is input into the text filter to extract the key information text,and then combined with the user query into the large language model to generate the answer and return it to the user.Experimental results show that compared with the alignment optimization method based on query rewriting,the proposed method improves the Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence(ROUGE-L)index by 2.3 percentage points and 9.9 percentage points respectively,and the F1 index by 1.7 percentage points and 4.1 percentage points respectively on the public traditional Chinese medicine question answering dataset and the general domain question answering dataset CMRC2018.Experimental results verify the effectiveness of the proposed method in improving the accuracy of domain question answering.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171