基于多任务预训练的AMR文本生成研究被引量：2

Improving AMR-to-text Generation with Multi-task Pre-training

作　　者：徐东钦李军辉[1] 朱慕华周国栋[1] XU Dong-Qin;LI Jun-Hui;ZHU Mu-Hua;ZHOU Guo-Dong(School of Computer Science and Technology,Soochow University,Suzhou 215006,China;Tencent News,Tencent Technology(Beijing)Co.Ltd.,Beijing 100001,China)

机构地区：[1]苏州大学计算机科学与技术学院,江苏苏州215006 [2]腾讯科技(北京)有限公司腾讯新闻,北京100001

出　　处：《软件学报》2021年第10期3036-3050,共15页Journal of Software

基　　金：国家重点研发计划(2017YFB1002101);国家自然科学基金(61876120)。

摘　　要：抽象语义表示(abstract meaning representation,简称AMR)文本生成的任务是给定AMR图,生成与其语义一致的文本.相关工作表明,人工标注语料的规模大小直接影响了AMR文本生成的性能.为了降低对人工标注语料的依赖,提出了基于多任务预训练的AMR文本生成方法.特别地,基于大规模自动标注AMR语料,提出与AMR文本生成任务相关的3个预训练任务,分别是AMR降噪自编码、句子降噪自编码以及AMR文本生成任务本身.此外,基于预训练模型,在朴素微调方法的基础上,进一步提出了基于多任务训练的微调方法,使得最终模型不仅适用于AMR文本生成,同时还适用于预训练任务.基于两个AMR标准数据集的实验结果表明:使用0.39M自动标注数据,提出的预训练方法能够大幅度提高AMR文本生成的性能,在AMR2.0和AMR3.0上分别提高了12.27和7.57个BLEU值,性能分别达到40.30和38.97.其中,在AMR2.0上的性能为目前报告的最优值,在AMR3.0上的性能为目前为止首次报告的性能.Given an AMR(abstract meaning representation)graph,AMR-to-text generation aims to generate text with the same meaning.Related studies show that the performance of AMR-to-text severely suffers from the size of the manually annotated dataset.To alleviate the dependence on manually annotated dataset,this study proposes a novel multi-task pre-training for AMR-to-text generation.In particular,based on a large-scale automatic AMR dataset,three relevant pre-training tasks are defined,i.e.,AMR denoising auto-encoder,sentence denoising auto-encoder,and AMR-to-text generation itself.In addition,to fine-tune the pre-training models,the vanilla fine-tuning method is further extended to multi-task learning fine-tuning,which enables the final model to maintain performance on both AMR-to-text and pre-training tasks.With the automatic dataset of 0.39M sentences,detailed experimentation on two AMR benchmarks shows that the proposed pre-training approach significantly improves the performance of AMR-to-text generation,with the improvement of 12.27 BLEU on AMR2.0 and 7.57 on AMR3.0,respectively.This greatly advances the state-of-the-art performance with 40.30 BLEU on AMR2.0 and 38.97 on AMR 3.0,respectively.To the best knowledge,this is the best result achieved so far on AMR 2.0 while AMR-to-text generation performance on AMR 3.0 is firstly reported.

关键词：AMR AMR文本生成多任务预训练序列到序列模型

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多任务预训练的AMR文本生成研究被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多任务预训练的AMR文本生成研究 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于多任务预训练的AMR文本生成研究被引量：2