通过多文档精排与融合的开放域问答任务增强实现

Open-Domain Question Answering Task Enhanced by Multiple Documents Refinement and Fusion

作　　者：李博朱天佑刘俊健吕宏伟陈振宇 LI Bo;ZHU Tianyou;LIU Junjian;LYU Hongwei;CHEN Zhenyu(Big Data Center,State Grid Corporation of China,Beijing 100053,China)

机构地区：[1]国家电网有限公司大数据中心,北京100053

出　　处：《软件导刊》2024年第9期82-89,共8页Software Guide

基　　金：国家电网有限公司大数据中心自建科技项目(SGSJ0000YFJS2200047)。

摘　　要：开放域问答(OpenQA)是自然语言处理中的一项具有挑战性的任务,传统的机器学习和深度学习技术通常用于从原始语料库中检索与问题相关的候选文档片段以进行答案提取。然而,当前方法检索的候选文档片段往往包含大量的噪声以及与问题无关的信息,并且主流的OpenQA模型在准确响应需要多个文档片段作为相关证据的问题方面存在不足。鉴于此,提出通过多文档精排与融合增强开放域问答的方法(RFMD),该方法在检索阶段设计了基于Transformer的文档精排模块,以减少候选文档中的噪声信息;在阅读理解阶段,RFMD采用以文本生成为中心的问答模块,通过构建跨文档片段的全局注意力机制,整合多个相关文档片段的信息,准确回答需要多个文档片段作为支持证据的问题。RFMD在NaturalQuestions和TriviaQA数据集上的EM得分分别达到45.8%和63.4%,验证了该模型在OpenQA任务中的有效性和优越性。Open-domain question answering(OpenQA)is a challenging task in natural language processing,the conventional machine learn-ing and deep learning techniques are commonly used to retrieve many candidate document fragments related to the question from the raw cor-pus for answer extraction.However,the candidate document fragments retrieved by current methods tend to include considerable noise and ir-relevant information to the question,and the previous OpenQA model falls short in accurately responding to questions that necessitate multiple document fragments as correlative evidence.Therefore,this paper proposes an open-domain question answering method based on refinement and fusion of multiple documents(RFMD).Specifically,RFMD designs a Transformer-based document refinement module during the retrieval stage to reduce noise information in the candidate documents.In the reading comprehension stage,RFMD employs a text generation-focused question answering module.By constructing a global attention mechanism across document fragments,it integrates information from multiple relevant document fragments to accurately answer questions that require multiple document fragments as supporting evidence.RFMD achieves EM scores of 45.8%and 63.4%on the NaturalQuestions and TriviaQA datasets respectively,verifying the effectiveness and superiority of the model in OpenQA tasks.

关键词：开放域问答预训练模型生成模型相似度分数 Prompt设计

分类号：TP301.6[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

通过多文档精排与融合的开放域问答任务增强实现

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

通过多文档精排与融合的开放域问答任务增强实现

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索