基于集成学习的最小错误率训练算法  

Minimum Error Rate Training Based on Ensemble Learning

在线阅读下载全文

作  者:陈昉[1] 王志豪[2] 赵程绮 李江梦 

机构地区:[1]厦门大学软件学院 [2]厦门大学信息科学与技术学院,福建厦门361005

出  处:《厦门大学学报(自然科学版)》2015年第6期893-899,共7页Journal of Xiamen University:Natural Science

摘  要:最小错误率训练是统计机器翻译的标准调参方法,在统计机器翻译建模过程中发挥着重要作用.然而,该方法在训练过程中容易出现训练过拟合现象,即开发集训练得到的权重无法很好地适用于翻译测试集.针对该问题,本文引入集成学习方法来优化调参.在调参时挑选不同的特征子集来训练多组特征权重,并计算权重之间的空间距离以删除不合理的特征权重,再根据各组子集在开发集上的BLEU(bilingual evaluation understudy)值来进行加权平均,获得最终的特征权重.NIST和IWSLT实验结果表明,该方法具有较好的效果.Minimum error rate training (MERT) is a standard tuning parameter procedure in statistical machine translation,playing a significant role in the process. However, the overfitting phenomenon is likely to occur in the original MERT. In other words, weights trained from development set cannot be fit for test sets. In view of this issue, we adopt ensemble learning method to the train- ing process in this paper. To be specific,we first select different feature subsets to acquire several groups of feature weights through MERT,and then filter out unreasonable weights according to their spatial distance, and at last we compute the weighted average as the final feature weight based on their BLEU scores on development set. Experiments on NIST and IWSLT show that our method is efficient for the translation tasks using the training and testing data sets of different domains.

关 键 词:机器翻译 最小错误率 训练 集成学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象