基于混淆网络解码的机器翻译多系统融合  被引量:3

Confusion Network Based System Combination for Statistical Machine Translation

在线阅读下载全文

作  者:杜金华[1] 魏玮[1] 徐波[1] 

机构地区:[1]中国科学院自动化研究所数字内容技术研究中心,北京100190

出  处:《中文信息学报》2008年第4期48-54,共7页Journal of Chinese Information Processing

基  金:国家863计划资助项目(2006AA01Z194)

摘  要:在对当前几种较流行的统计机器翻译多系统融合方法分析的基础上,提出了一种改进的多系统融合框架,该框架集成了最小贝叶斯风险解码和多特征混淆网络解码两种技术。融合过程如下:(1)从多个翻译系统输出的-best结果中,利用最小贝叶斯风险解码器选择一个风险最小的假设作为对齐参考;(2)将其余的-best假设结果与该参考对齐,从而构建混淆网络。多特征混淆网络基于对数线性模型,引入了更多有效的知识源参与最优路径选择,融合后的BLEU得分比融合前最好的单系统BLEU得分提高了2.19%。在对齐方法上,我们提出了一种改进的翻译错误率(Translation Error Rate,TER)准则——GIZA-TER准则,该准则可以对CN网络进行更有效的短语调序。实验中的显著性检验证明了本文方法的有效性。Based on several popular methods of statistical machine translation combination, an improved multiple system combination framework is proposed. This framework integrates Minimum Bayes Risk (MBR) decoding and multi-feature Confusion Network (CN) decoding techniques with the following steps: (1)MBR decoding technique is used to select the hypothesis with minimum risk as an alignment reference from several N-best results produced by translation systems ; (2)CN is constructed by aligning the other hypotheses with the reference. Based on log linear model, the CN introduces more knowledge sources into the selection of optimal path. Compared with the best system without combination, the proposed framework has 2.19% improvement in BLEU score. In: addition, we present a modified Translation Edit Rate (TER)——GIZA-TER metric for CN alignment, which facilitates a more effec rive phrase re-ordering. The significance tests demonstrate the validness of the proposed methods.

关 键 词:人工智能 机器翻译 多系统融合 最小贝叶斯风险解码 多特征混淆网络 GIZA—TER 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象