检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:朱嘉辉 韩韧[1] 张生[1] 陈思州 ZHUJiahui;HAN Ren;ZHANG Sheng;CHEN Sizhou(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;Blockchain Industry Institute,Chengdu University of Information Technology,Chengdu 610225,China)
机构地区:[1]上海理工大学光电信息与计算机工程学院,上海200093 [2]成都信息工程大学区块链产业学院,成都610225
出 处:《计算机工程与应用》2025年第8期163-172,共10页Computer Engineering and Applications
基 金:国家重点研发计划项目(2018YFB1700900)。
摘 要:利用BERT等预训练模型的上下文表示增强神经机器翻译,能够显著提升低资源翻译的效果。现有融合BERT的方法主要分为两大类,其一是初始化编码器参数并微调,其二是将上下文嵌入融入翻译模型。前者训练参数量大且容易受到灾难性遗忘的影响,后者融合的方法较为复杂。此外,这两类方法仅利用了源语言端的BERT表征,而未充分利用机器翻译的对偶特性。针对上述问题,提出一种线性复杂度的压缩注意力模块。通过可学习的压缩向量,压缩mBERT上下文嵌入并对齐到翻译模型的语义空间中。将压缩向量与编码器的输入向量拼接,从而增强源端语义表示。通过所提出的对偶多粒度训练的方式,同时增强翻译模型的双语表示能力。在IWSLT的两个公开低资源翻译口语数据集上的实验结果显示,相较于基线模型Transformer,该方法取得了2.07~2.66的BLEU值提升,验证了其有效性。Using contextual representations of pre-trained models such as BERT to enhance neural machine translation can significantly improve the performance of low-resource translation.The existing methods for integrating BERT mainly fall into two categories:initializing encoder parameters and fine-tuning,and incorporating contextual embeddings into the translation model.The former approach involves a large number of training parameters and is prone to catastrophic forgetting,while the latter tends to be more complex.In addition,these two methods only utilize the BERT representation on the source language side,without fully utilizing the dual characteristics of machine translation.To address these issues,this paper proposes a compressed attention module with linear complexity.Firstly,compressed mBERT context embeddings are aligned to the semantic space of the translation model through learnable compression vectors.Secondly,the compressed vectors are spliced with the input vectors of the encoder to enhance the semantic representation of the source side.Finally,through the proposed dual multi-granularity training method,the bilingual representation ability of the translation model is simultaneously enhanced.Experimental results on two public low-resource spoken language translation datasets from IWSLT show that this method achieves a BLEU score improvement of 2.07~2.66 compared to the baseline model Transformer,validating its effectiveness.
关 键 词:mBERT知识增强 压缩注意力 低资源机器翻译 对偶训练
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222