多领域机器翻译中的非参贝叶斯短语归纳  被引量:1

Multi-domain bayesian non-parametric phrasal induction in machine translation

在线阅读下载全文

作  者:刘宇鹏[1,2] 马春光[2] 朱晓宁[3] 乔秀明 LIU Yupeng;MA Chunguang;Zhu Xiaoning;Qiao Xiuming(School of Software, Harbin University of Science and Technology, Harbin 150001, China;College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)

机构地区:[1]哈尔滨理工大学软件学院,黑龙江哈尔滨150001 [2]哈尔滨工程大学计算机科学与技术学院,黑龙江哈尔滨150001 [3]哈尔滨工业大学计算机学院,黑龙江哈尔滨150001

出  处:《哈尔滨工程大学学报》2017年第10期1616-1622,共7页Journal of Harbin Engineering University

基  金:国家自然科学青年基金项目(61300115);中国博士后科学基金项目(2014M561331);黑龙江省教育厅科技研究项目(12521073)

摘  要:多领域机器翻译一直以来都是机器翻译领域研究的重点,而短语归纳是重中之重。传统加权的方法并没有考虑到整个归约过程,本文提出了一种使用层次化的Pitman Yor过程进行短语归约,同时把多通道引入到模型中,使得在短语归约的过程中平衡各领域的影响;从模型角度,本文的方法为生成式模型,模型更有表现力,且把对齐和短语抽取一起建模,克服了错误对齐对原有短语抽取性能的影响。从复杂度上来说,该模型独立于解码,更易于训练;从多领域融合来说,对短语归约过程中进行融合,更好地考虑到整个归约过程。在两种不同类型的语料上验证了机器翻译的性能,相对于传统的单领域启发式短语抽取和多领域加权,BLEU分数有所提高。Domain adaptation has always been a key research field of machine translation, in which phrase induction is a top priority. The traditional weighted method did not take into account the entire phrase induction process. This paper proposed a method that uses hierarchical Pitman-Yor process to extract phrase pairs. Multiple channels were introduced into the model to balance the weight of various fields in the phrase induction process. From the point of the model, the generative model was expressive, and the alignment and phrase extraction were modeled together, which overcame the effect of wrong alignment on the original phrase extraction performance. From the view of com- plexity, the model is independent of decoding and easy to train. From the perspective of multi-domain combination, the process of phrase reduction combination takes into account the entire reduction process better. Machine transla- tion performance was validated on two different types of corpus. Compared with the traditional method of weighted multi-domain and heuristic phrase extraction in single domain, the performance measured by BLEU score was improved.

关 键 词:多领域机器翻译 非参贝叶斯 短语归纳 Pitman Yor过程 生成式模型 块采样 中餐馆过程 BLEU分数 

分 类 号:TP391.23[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象