检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘宇鹏[1,2] 马春光[2] 朱晓宁[3] 乔秀明 LIU Yupeng;MA Chunguang;Zhu Xiaoning;Qiao Xiuming(School of Software, Harbin University of Science and Technology, Harbin 150001, China;College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
机构地区:[1]哈尔滨理工大学软件学院,黑龙江哈尔滨150001 [2]哈尔滨工程大学计算机科学与技术学院,黑龙江哈尔滨150001 [3]哈尔滨工业大学计算机学院,黑龙江哈尔滨150001
出 处:《哈尔滨工程大学学报》2017年第10期1616-1622,共7页Journal of Harbin Engineering University
基 金:国家自然科学青年基金项目(61300115);中国博士后科学基金项目(2014M561331);黑龙江省教育厅科技研究项目(12521073)
摘 要:多领域机器翻译一直以来都是机器翻译领域研究的重点,而短语归纳是重中之重。传统加权的方法并没有考虑到整个归约过程,本文提出了一种使用层次化的Pitman Yor过程进行短语归约,同时把多通道引入到模型中,使得在短语归约的过程中平衡各领域的影响;从模型角度,本文的方法为生成式模型,模型更有表现力,且把对齐和短语抽取一起建模,克服了错误对齐对原有短语抽取性能的影响。从复杂度上来说,该模型独立于解码,更易于训练;从多领域融合来说,对短语归约过程中进行融合,更好地考虑到整个归约过程。在两种不同类型的语料上验证了机器翻译的性能,相对于传统的单领域启发式短语抽取和多领域加权,BLEU分数有所提高。Domain adaptation has always been a key research field of machine translation, in which phrase induction is a top priority. The traditional weighted method did not take into account the entire phrase induction process. This paper proposed a method that uses hierarchical Pitman-Yor process to extract phrase pairs. Multiple channels were introduced into the model to balance the weight of various fields in the phrase induction process. From the point of the model, the generative model was expressive, and the alignment and phrase extraction were modeled together, which overcame the effect of wrong alignment on the original phrase extraction performance. From the view of com- plexity, the model is independent of decoding and easy to train. From the perspective of multi-domain combination, the process of phrase reduction combination takes into account the entire reduction process better. Machine transla- tion performance was validated on two different types of corpus. Compared with the traditional method of weighted multi-domain and heuristic phrase extraction in single domain, the performance measured by BLEU score was improved.
关 键 词:多领域机器翻译 非参贝叶斯 短语归纳 Pitman Yor过程 生成式模型 块采样 中餐馆过程 BLEU分数
分 类 号:TP391.23[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28