使用源语言复述知识改善统计机器翻译性能被引量：4

Improved Statistical Machine Translation with Source Language Paraphrase

出　　处：《北京大学学报（自然科学版）》2015年第2期342-348,共7页Acta Scientiarum Naturalium Universitatis Pekinensis

基　　金：国家国际科技合作专项(2014DFA11350);国家自然科学基金(61370130);北京交通大学人才基金(2011RC034)资助

摘　　要：为了缓解双语语料不足导致的翻译知识欠缺问题,提出基于复述技术的翻译框架。此框架利用第三种语言获取带有概率的复述知识表,以Lattice表示输入句子的多种复述形式,扩展解码器使之可以对Lattice形式的输入进行解码,将复述知识作为特征加入到对数线性模型的目标函数中。在保持原始翻译知识表不变的情况下,此框架不仅可以增大短语翻译表对源语言现象的覆盖率,也能够增加候选译文表现形式的多样性。在3个不同规模训练集上的对比实验结果表明,在训练语料规模最小的情况下(10 K句对),系统性能有明显提升(BLEU+1.4%);在训练语料规模最大的情况下(1 M句对),系统性能也取得一定提升(BLEU+0.32%)。The performance of statistical machine translation （SMT） suffers from the insufficiency of parallel corpus. To solve the problem, the authors propose a paraphrase based SMT framework with three solutions： 1） acquiring paraphrase knowledge based on a third language; 2） expressing multiple paraphrases of input sentence in a lattice and modifying decoder to be able to process it; 3） integrating paraphrase knowledge as features into log- linear model. In this way, not only more expressions in source language can be covered, but also more expressions in target language can be generated as candidate translations. To verify proposed method, experimetxts are conducted on three training data sets with different sizes, and evaluate the improvement of the performance of SMT system contributed by paraphrasing. Experimental results show that the translation performance is improved significantly （BLEU＋ 1.4%） when the parallel corpus is small （10 K）, and a good performance （BLEU＋0.32%） is also achieved when parallel corpus is large enough （1 M）.

关键词：复述知识短语翻译表特征解码器

分类号：TP391.2[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

使用源语言复述知识改善统计机器翻译性能被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

使用源语言复述知识改善统计机器翻译性能 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

使用源语言复述知识改善统计机器翻译性能被引量：4