基于子树交换的神经机器翻译数据增强方法  

A Data Augmentation Method Based on Subtree Exchange for Neural Machine Translation

在线阅读下载全文

作  者:迟春诚 李蔓菁 闫红[2] 李付学[2] CHI Chuncheng;LI Manjing;YAN Hong;LI Fuxue(College of Computer Science and Technology,Shenyang University of Chemical Technology,Shenyang Liaoning 110142,China;College of Electrical Engineering,Yingkou Institute of Technology,Yingkou Liaoning 115014,China)

机构地区:[1]沈阳化工大学计算机科学与技术学院,辽宁沈阳110142 [2]营口理工学院电气工程学院,辽宁营口115014

出  处:《鞍山师范学院学报》2023年第2期64-70,共7页Journal of Anshan Normal University

基  金:辽宁省自然科学基金(2021-YKLH-12;2022-YKLH-18).

摘  要:神经机器翻译在双语资源丰富的场景下,具有良好的性能,但在资源稀缺的情况下,其翻译性能急剧下降.针对稀缺资源翻译任务,本文提出一种基于子树交换的数据增强方法.首先,将目标端句子生成对应的句法树;其次,使用子树交换算法生成新的伪单语数据;最后,利用反向翻译方法生成目标译文,构成伪平行数据.实验结果表明,同基线模型和已有数据增强方法能相比,基于句法子树交换数据增强方法能显著提高模型的翻译性能.Neural machine translation has achieved good performance with a high-resource bilingual corpus.However,the model leads to poor translation quality in the case of low-resource scenarios.For the low-resource translation task,this paper proposes a data augmentation method based on subtree exchange.Firstly,generating the corresponding syntactic tree of the target sentence;secondly,running the subtree exchange algorithm to generate new pseudo-monolingual data.In the end,the back-translation approach is applied to produce the target translation,and this is followed by the production of the pseudo-parallel corpus.Experimental results on several translation tasks show that the data augmentation method based on subtree exchange improves the translation quality significantly compared with the baseline model and existing data augmentation methods.

关 键 词:子树交换 神经机器翻译 数据增强 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象