利用压缩多语言BERT知识增强的低资源神经机器翻译

Enhancing Low-Resource Neural Machine Translation with Compressed Multilingual BERT Knowledge

作　　者：朱嘉辉韩韧[1] 张生[1] 陈思州 ZHUJiahui;HAN Ren;ZHANG Sheng;CHEN Sizhou(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;Blockchain Industry Institute,Chengdu University of Information Technology,Chengdu 610225,China)

机构地区：[1]上海理工大学光电信息与计算机工程学院,上海200093 [2]成都信息工程大学区块链产业学院,成都610225

出　　处：《计算机工程与应用》2025年第8期163-172,共10页Computer Engineering and Applications

基　　金：国家重点研发计划项目(2018YFB1700900)。

摘　　要：利用BERT等预训练模型的上下文表示增强神经机器翻译,能够显著提升低资源翻译的效果。现有融合BERT的方法主要分为两大类,其一是初始化编码器参数并微调,其二是将上下文嵌入融入翻译模型。前者训练参数量大且容易受到灾难性遗忘的影响,后者融合的方法较为复杂。此外,这两类方法仅利用了源语言端的BERT表征,而未充分利用机器翻译的对偶特性。针对上述问题,提出一种线性复杂度的压缩注意力模块。通过可学习的压缩向量,压缩mBERT上下文嵌入并对齐到翻译模型的语义空间中。将压缩向量与编码器的输入向量拼接,从而增强源端语义表示。通过所提出的对偶多粒度训练的方式,同时增强翻译模型的双语表示能力。在IWSLT的两个公开低资源翻译口语数据集上的实验结果显示,相较于基线模型Transformer,该方法取得了2.07~2.66的BLEU值提升,验证了其有效性。Using contextual representations of pre-trained models such as BERT to enhance neural machine translation can significantly improve the performance of low-resource translation.The existing methods for integrating BERT mainly fall into two categories:initializing encoder parameters and fine-tuning,and incorporating contextual embeddings into the translation model.The former approach involves a large number of training parameters and is prone to catastrophic forgetting,while the latter tends to be more complex.In addition,these two methods only utilize the BERT representation on the source language side,without fully utilizing the dual characteristics of machine translation.To address these issues,this paper proposes a compressed attention module with linear complexity.Firstly,compressed mBERT context embeddings are aligned to the semantic space of the translation model through learnable compression vectors.Secondly,the compressed vectors are spliced with the input vectors of the encoder to enhance the semantic representation of the source side.Finally,through the proposed dual multi-granularity training method,the bilingual representation ability of the translation model is simultaneously enhanced.Experimental results on two public low-resource spoken language translation datasets from IWSLT show that this method achieves a BLEU score improvement of 2.07~2.66 compared to the baseline model Transformer,validating its effectiveness.

关键词：mBERT知识增强压缩注意力低资源机器翻译对偶训练

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

利用压缩多语言BERT知识增强的低资源神经机器翻译

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

利用压缩多语言BERT知识增强的低资源神经机器翻译

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索