基于多特征融合的统计机器翻译译文错误检测  

Error Detection in Translation Version by Statistical Machine Translation Based on Feature Integration

在线阅读下载全文

作  者:王莎[1] 杜金华[1] 刘丁[1] 

机构地区:[1]西安理工大学自动化与信息工程学院,陕西西安710048

出  处:《西安理工大学学报》2013年第1期32-37,共6页Journal of Xi'an University of Technology

基  金:国家自然科学基金资助项目(61100085);陕西省教育厅专项科研计划基金资助项目(11JK1029)

摘  要:抽取了3种典型的单词后验概率特征(基于固定位置的词后验概率、基于目标位置窗的词后验概率、基于词对齐的词后验概率)和3种语言学特征(词、词性、句法分析器抽取的句法特征),并在此基础上抽取了一个来自源端的单词特征,然后基于中英NIST数据集,采用最大熵分类器来验证不同单词后验概率特征(WPP)独立使用及与其它特征组合后使用时对错误检测性能的影响。实验结果表明,采用不同方法计算得到的单词后验概率特征对分类错误率的影响是显著的,并且在单词后验概率和语言学特征组合基础上加入源端单词特征,可以显著降低分类错误率(CER),提高译文错误检测能力。Three kinds of typical word posterior probability(WPP) features(based on the fixed position,sliding window-WPP and alignment-based WPP) and three kinds of linguistic features(word,POS and LG parsing knowledge) are extracted to detect errors,on the basis of which a source-side word feature is extracted,and then based on NIST data setting the maximum entropy classifier is adopted to test the different word posterior probability features as well as the effect upon the independent use and error detection performances of other feature integration in use.The experimental results show that the WPP features obtained using the different calculation methods have the obvious effects upon the classification error rate and also that the source-side word feature incorporated on the basis of integration of the WPP and linguistic features can significantly reduce the CER values and improve the detection performances in translation version errors.

关 键 词:最大熵分类器 单词后验概率 语言学特征 源端单词特征 错误检测 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象