基于多特征融合的统计机器翻译译文错误检测

Error Detection in Translation Version by Statistical Machine Translation Based on Feature Integration

出　　处：《西安理工大学学报》2013年第1期32-37,共6页Journal of Xi'an University of Technology

基　　金：国家自然科学基金资助项目(61100085);陕西省教育厅专项科研计划基金资助项目(11JK1029)

摘　　要：抽取了3种典型的单词后验概率特征(基于固定位置的词后验概率、基于目标位置窗的词后验概率、基于词对齐的词后验概率)和3种语言学特征(词、词性、句法分析器抽取的句法特征),并在此基础上抽取了一个来自源端的单词特征,然后基于中英NIST数据集,采用最大熵分类器来验证不同单词后验概率特征(WPP)独立使用及与其它特征组合后使用时对错误检测性能的影响。实验结果表明,采用不同方法计算得到的单词后验概率特征对分类错误率的影响是显著的,并且在单词后验概率和语言学特征组合基础上加入源端单词特征,可以显著降低分类错误率(CER),提高译文错误检测能力。Three kinds of typical word posterior probability（WPP） features（based on the fixed position,sliding window-WPP and alignment-based WPP） and three kinds of linguistic features（word,POS and LG parsing knowledge） are extracted to detect errors,on the basis of which a source-side word feature is extracted,and then based on NIST data setting the maximum entropy classifier is adopted to test the different word posterior probability features as well as the effect upon the independent use and error detection performances of other feature integration in use.The experimental results show that the WPP features obtained using the different calculation methods have the obvious effects upon the classification error rate and also that the source-side word feature incorporated on the basis of integration of the WPP and linguistic features can significantly reduce the CER values and improve the detection performances in translation version errors.

关键词：最大熵分类器单词后验概率语言学特征源端单词特征错误检测

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多特征融合的统计机器翻译译文错误检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多特征融合的统计机器翻译译文错误检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索