检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:占思琦 徐志展 杨威[2] 谢抢来[2] Zhan Siqi;Xu Zhizhan;Yang Wei;Xie Qianglai(College of Information Engineering,Jiangxi University of Technology,Nanchang 330098,China;Big Data Laboratory of Collaborative Innovation Center,Jiangxi University of Technology,Nanchang 330098,China)
机构地区:[1]江西科技学院信息工程学院,南昌330098 [2]江西科技学院协同创新中心大数据实验室,南昌330098
出 处:《计算机应用研究》2024年第3期799-804,810,共7页Application Research of Computers
基 金:江西省教育厅科学技术研究资助项目(GJJ2202613,GJJ212015)。
摘 要:神经机器翻译(NMT)在多个领域应用中已取得显著成效,在大规模语料库上已充分论证其优越性。然而,在语料库资源不足的情形下,仍存在较大的改进空间。由于汉语-马来语(汉-马)平行语料的匮乏,直接导致了汉-马机器翻译的翻译效果不佳。为解决汉-马低资源机器翻译不理想的问题,提出了一种基于深度编码注意力和渐进式解冻的低资源神经机器翻译方法。首先,利用XLNet预训练模型重构编码器,在编码器中使用了XLNet动态聚合模块替代了传统编码层的输出方式,有效弥补了低资源汉-马语料匮乏的瓶颈;其次,在解码器中使用并行交叉注意力模块对传统编码-解码注意力进行了改进,提升了源词和目标词的潜在关系的捕获能力;最后,对提出模型采用渐进式解冻训练策略,最大化释放了模型的性能。实验结果表明,提出方法在小规模的汉-马数据集上得到了显著的性能提升,验证了方法的有效性,对比其他的低资源NMT方法,所提方法结构更为精简,并改进了编码器和解码器,翻译效果提升更加显著,为应对低资源机器翻译提供了有效的策略与启示。Neural machine translation(NMT)has achieved remarkable results in applications in many fields,and it has fully demonstrated its superiority on large-scale corpora.However,there is still a huge room for improvement when there are insufficient corpus resources.The lack of a Chinese-Malay parallel corpus directly affects the translation effect of Chinese-Malay machine translation.In order to solve the problem of unsatisfactory Chinese-Malay low-resource machine translation,this paper proposed a low-resource neural machine translation method based on deep encoded attention and progressive unfreezing.Firstly,this method reconstructed the encoder using the XLNet pre-training model and replaced the output mode of the traditional encoding layer with the XLNet dynamic aggregation module in order to effectively compensate for the bottleneck caused by the lack of Chinese-Malay corpus.Secondly,it improved the traditional encoding-decoding attention by using a parallel cross-attention module in the decoder,which enhanced the ability to capture the potential relationship between the source word and the target word.Finally,it adopted a progressive unfreezing training strategy to maximize the release of the model’s perfor-mance.The experimental results demonstrate that the proposed method significantly improves the performance on a small-scale Chinese-Malay dataset,thus confirming its effectiveness.Compared with other low-resource NMT methods,this method had a simpler structure,and improved the encoder and decode,resulting in a more significant enhancement in the translation effect.The approach provides effective strategies and insights to cope with low-resource machine translation.
关 键 词:神经网络 汉-马机器翻译 低资源 渐进式解冻 预训练
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222