基于LSTM的蒙汉机器翻译的研究  被引量:8

Mongolian-Chinese machine translation based on LSTM

在线阅读下载全文

作  者:刘婉婉[1] 苏依拉[1] 乌尼尔[1] 仁庆道尔吉[1] LIU Wan-wan;SU Yi-la;WU Ni-er;RENQING Dao-er-ji(College of Information Engineering,Inner Mongolia University of Technology,Hohhot 010080,China)

机构地区:[1]内蒙古工业大学信息工程学院,内蒙古呼和浩特010080

出  处:《计算机工程与科学》2018年第10期1890-1896,共7页Computer Engineering & Science

基  金:国家自然科学基金(61363052;61502255);内蒙古自治区自然科学基金(2016MS0605);内蒙古民族事务委员会基金(MW-2017-MGYWXXH-03)

摘  要:由于内蒙古地区蒙汉机器翻译水平落后、平行双语语料规模较小,利用传统的统计机器翻译方法会出现数据稀疏以及训练过拟合等问题,导致翻译质量不高。针对这种情况,提出基于LSTM的蒙汉神经机器翻译方法,通过利用长短时记忆模型构建端到端的神经网络框架并对蒙汉机器翻译系统进行建模。为了更有效地理解蒙古语语义信息,根据蒙古语的特点将蒙古文单词分割成词素形式,导入模型,并在模型中引入局部注意力机制计算与目标词有关联的源语词素的权重,获得蒙古语和汉语词汇间的对齐概率,从而提升翻译质量。实验结果表明,该方法相比传统蒙汉翻译系统提高了翻译质量。Due to the small scale of Mongolian-Chinese bilingual parallel corpus and problems such as sparse data and over fitting of data training,the translation quality of traditional statistical machine translation methods for Mongolian-Chinese translation needs to be improved.In view of this situation,we propose a Mongolian-Chinese neural machine translation method based on LSTM.It constructs an end-to-end neural network frame by using the long and short memory model and models the Mongolian-Chinese machine translation system.In order to understand Mongolian sematic information more effectively,Mongolian words are divided into morphemes according to the characteristics of Mongolian language,which are then introduced into the model.Besides,the local attention mechanism is introduced into the model to calculate the weight of the source morphemes that are associated with the target word to achieve the probability of alignment between Mongolian and Chinese vocabularies and improve the translation quality.Experimental results show that compared with the traditional Mongolian-Chinese translation system,the proposed method obtains better translation quality.

关 键 词:注意力 端到端模型 机器翻译 蒙汉 LSTM神经网络 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象