基于序列到序列模型的无监督文本简化方法被引量：1

Unsupervised text simplification with sequence-to-sequence model

作　　者：李天宇李云[1] 钱镇宇 Li Tianyu;Li Yun;Qian Zhenyu(School of Information Engineering,Yangzhou University,Yangzhou Jiangsu 225137,China)

机构地区：[1]扬州大学信息工程学院,江苏扬州225137

出　　处：《计算机应用研究》2021年第1期93-96,100,共5页Application Research of Computers

基　　金：国家自然科学基金资助项目(61703362);江苏省研究生科研与实践创新计划项目(SJCX19_0888)。

摘　　要：训练基于序列到序列(seq2seq)的文本简化模型需要大规模平行语料库,但是规模较大且标注质量较好的语料却难以获得。为此,提出一种无监督文本简化方法,使模型的学习仅需要无标注的复杂句和简单句语料。首先,利用去噪自编码器(denoising autoencoder)分别从简单句语料和复杂句语料中学习,获取简单句的自编码器和复杂句的自编码器;然后,组合两个自编码器形成初始的文本简化模型和文本复杂化模型;最后,利用回译策略(back-translation)将无监督文本简化问题转换为监督问题,不断迭代优化文本简化模型。通过在标准数据集上的实验验证,该方法在通用指标BLEU和SARI上均优于现有无监督模型,同时在词汇级别和句法级别均有简化效果。Training text simplification model based on seq2seq requires large-scale parallel corpora.However,current task lacks large-scale and well-labeled parallel corpora.To address the above issues,this paper proposed an unsupervised text simplification algorithm that made the learning of the model only need simple and complex sentence datasets without labels.First,the method used denoising autoencoder to learn from simple sentence corpus and complex sentence corpus,respectively,to obtain a simple sentence autoencoder and a complex sentence autoencoder.Then,it combined the two autoencoders to form an initial text simplification model and a text complication model.Finally,it used back-translation to convert the unsupervised text simplification problem into a supervised problem,and iteratively optimized the text simplification model.Experiments on the standard dataset show that the method is superior to the existing unsupervised model on the general indicators BLEU and SARI,and the model has simplified effects at both the lexical and syntactic level.

关键词：文本简化无监督序列到序列模型去噪自编码器

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于序列到序列模型的无监督文本简化方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于序列到序列模型的无监督文本简化方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于序列到序列模型的无监督文本简化方法被引量：1