自适应的流水线式无监督问题生成方法  

Adaptive Pipeline Unsupervised Question Generation Method

在线阅读下载全文

作  者:李昆泽 张宇[1] Li Kunze;Zhang Yu(Research Center for Social Computing and Interactive Robotics,Harbin Institute of Technology,Harbin 150001)

机构地区:[1]哈尔滨工业大学社会计算与交互机器人研究中心,哈尔滨150001

出  处:《计算机研究与发展》2025年第4期905-914,共10页Journal of Computer Research and Development

基  金:国家自然科学基金项目(62476066,62277002)。

摘  要:在传统的问答任务中,模型一般需要大量的数据进行训练,而标注这些数据需要较多的时间和人力成本.无监督问题生成是解决问答任务训练数据匮乏的一种有效方法,但是目前使用该方法生成的问题存在难以回答、种类单一、语义不明等问题.针对这些问题,提出了一个自适应的多模块流水线式模型ADVICE,多个模块分别从问题可回答性、问题多样性和语法规范性对现有方法进行改进.在问题可回答性模块中,使用了共指消解和命名实体识别技术来提升问题的可回答性.在问题多样性模块中,针对不同提问方式的问题设计了不同的规则来提升问题类型多样性与答案类型多样性.在语法规范性模块中,基于T5训练了一个针对问句的语法错误纠正模型,并设计了一个筛选模块对纠正后的问答数据进行过滤.最后,训练了一个分类器自动选择所需要的模块.实验表明,使用改进后的问题生成方法,下游的问答模型在SQuAD数据集上的EM值平均提升了2.9个百分点,F1值平均提升了4.4个百分点.In traditional question-answering tasks,models generally require extensive data for training,which entails considerable time and manpower costs for data annotation.Unsupervised question generation represents an effective solution to address the scarcity of training data in question-answering tasks.However,the questions generated using this approach currently suffer from issues such as being difficult to answer,lacking variety,and having unclear semantics.To address these issues,we propose an adaptive multi-module pipeline model named ADVICE,with modules improving existing methods in answerability,question diversity and grammatical correctness.Within the question answerability module,we employ coreference resolution and named entity recognition techniques to improve the answerability of questions.For question diversity,we design specific rules for various question types to enhance the diversity of question and answer types.In the grammatical correctness module,a grammar error correction model targeted at questions is trained based on T5 model,and a filtering module is designed to refine the generated questionanswer data.Finally,a classifier is trained to automatically select the necessary modules.Experiments demonstrate that the improved question generation method enhances the performance of downstream question-answering models on the SQuAD dataset,with the EM(exact match)score increasing by an average of 2.9%and the F1 score by an average of 4.4%.

关 键 词:无监督学习 问题生成 预训练模型 深度学习 自然语言处理 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象