基于文本知识增强的问题生成模型  被引量:1

Question Generation Model Based on Text Knowledge Enhancement

在线阅读下载全文

作  者:陈佳玉 王元龙[1] 张虎[1] CHEN Jiayu;WANG Yuanlong;ZHANG Hu(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,Shanxi,China)

机构地区:[1]山西大学计算机与信息技术学院,山西太原030006

出  处:《计算机工程》2024年第6期86-93,共8页Computer Engineering

基  金:国家自然科学基金(62176145)。

摘  要:预训练语言模型在大规模训练数据和超大规模算力的基础上,能够从非结构化的文本数据中学到大量的知识。针对三元组包含信息有限的问题,提出利用预训练语言模型丰富知识的问题生成方法。首先,利用预训练语言模型中丰富的知识增强三元组信息,设计文本知识生成器,将三元组中的信息转化为子图描述,丰富三元组的语义;然后,使用问题类型预测器预测疑问词,准确定位答案所在的领域,从而生成语义正确的问题,更好地控制问题生成的效果;最后,设计一种受控生成框架对关键实体和疑问词进行约束,保证关键实体和疑问词同时出现在问题中,使生成的问题更加准确。在公开数据集WebQuestion和PathQuestion中验证所提模型的性能。实验结果表明,与现有模型LFKQG相比,所提模型的BLUE-4、METEOR、ROUGE-L指标在WebQuestion数据集上分别提升0.28、0.16、0.22个百分点,在PathQuestion数据集上分别提升0.8、0.39、0.46个百分点。Pre-trained language models,which are trained on large-scale datasets with extensive computing power,can extract significant amounts of knowledge from unstructured text data.To address the limited information in current triplets,a method is proposed that utilizes pre-trained language models to enrich this knowledge.Initially,a textual knowledge generator is designed to enhance the semantics of the triplets by leveraging the extensive knowledge embedded in the pre-trained models.This generator transforms the information within the triplets into subgraph descriptions.Subsequently,a question type predictor is employed to determine the appropriate question words.These question words are essential for question generation as they help to locate the domain of the answer accurately,resulting in semantically coherent questions and enhanced control over the generation process.Finally,a controlled generation framework is developed to ensure that both key entities and question words appear in the generated questions,thereby increasing the accuracy of these questions.The efficacy of the proposed model is demonstrated on the public datasets WebQuestion and PathQuestion.When compared to the existing model LFKQG,the proposed model shows improvements in the BLUE-4,METEOR,and ROUGE-L metrics by 0.28,0.16,and 0.22 percentage points,respectively,on the WebQuestion dataset,and by 0.8,0.39,and 0.46 percentage points,respectively,on the PathQuestion dataset.

关 键 词:自然语言理解 问题生成 知识图谱 预训练语言模型 知识增强 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象