基于子词的句子级别神经机器翻译的译文质量估计方法  被引量:13

A sentence-level neural quality estimation of machine translation based on subword units

在线阅读下载全文

作  者:李培芸 翟煜锦 项青宇 李茂西[1] 裘白莲 罗文兵[1] 王明文[1] LI Peiyun;ZHAI Yujin;XIANG Qingyu;LI Maoxi;QIU Bailian;LUO Wenbing;WANG Mingwen(School of Computer Information Engineering,Jiangxi Normal University,Nanchang 330022,China)

机构地区:[1]江西师范大学计算机与信息工程学院,江西南昌330022

出  处:《厦门大学学报(自然科学版)》2020年第2期159-166,共8页Journal of Xiamen University:Natural Science

基  金:国家自然科学基金(61662031,61462044,61876074)。

摘  要:目前性能最优的译文质量估计系统使用神经机器翻译中的编码器-解码器模型作为特征提取器.该方法由于限制词表大小易导致数据稀疏问题,从而使得较多的未登陆词不能被正确评价.为了缓解上述问题,在详细分析不同子词切分方法的特点后,提出了基于字节对编码(BPE)子词切分和基于一元文法语言模型子词切分的神经译文质量估计方法,并将两者的译文质量估计的得分与基于词语切分的神经译文质量估计得分融合后进行译文质量估计.在WMT18句子级别译文质量估计子任务数据集上的实验结果表明:融合BPE子词切分、一元文法语言模型子词切分和词语切分的神经译文质量估计方法的性能在多个评测子任务上超过了WMT18给出的最好参与系统,深入的实验分析进一步揭示了融合不同粒度的句子切分方法提高了译文质量估计的健壮性.Nowadays,the state-of-the-art translation quality estimation system takes the encoder-decoder model in neural machine translation as feature extractor.Duing to the restriction of vocabulary size,this method is prone to data sparseness,so that many out-of-vocabulary words can t be correctly evaluated.To tackle the data sparse issues,we propose the neural quality estimation approaches based on the byte-pair-encoding(BPE)subword unit and unigram language model subword unit after a detailed discussion of the characteristic of different subword segmenters.Furthermore,results of the neural quality estimation systems based on BPE subwords and that of the unigram language model subwords are combined with the results of the neural quality estimation systems based on words.Experimental results on the data sets of WMT18 sentence-level quality estimation tasks show that the ensemble system combining the results of the neural quality estimation systems based on BPE subwords,unigram language model subwords and words perform better than the best participated systems on several translation directions in WMT18 quality estimation task.Deep analyses further reveal that the ensemble system combining the results of neural quality estimation systems based on different granularity segmentations improve the robustness of the quality estimation system.

关 键 词:质量估计 神经机器翻译 子词 编码器-解码器模型 循环神经网络 联合神经网络 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象