检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张超轶 陈媛[2] 张聚伟 ZHANG Chaoyi;CHEN Yuan;ZHANG Juwei(Electrical Engineering School,Henan University of Science&Technology,Luoyang 471023,China;Foreign Languages School,Henan University of Science&Technology,Luoyang 471023,China;Henan Province New Energy Vehicle Power Electronics and Power Transmission Engineering Research Center,Luoyang 471023,China)
机构地区:[1]河南科技大学电气工程学院,河南洛阳471023 [2]河南科技大学外国语学院,河南洛阳471023 [3]河南省新能源汽车电力电子与电力传动工程研究中心,河南洛阳471023
出 处:《河南科技大学学报(自然科学版)》2022年第4期61-66,75,M0006,共8页Journal of Henan University of Science And Technology:Natural Science
基 金:国家自然科学基金项目(U2004163)。
摘 要:针对电气工程领域英汉机器翻译中平行语料稀缺的问题,在使用通用语料训练翻译模型的基础上,提出了一种融合领域术语信息的嵌入层参数初始化方法。首先,对文本进行分词预处理,将术语词划分为一个最小单元;然后,利用Glove和Word2vec在不同单语语料上训练得到两种词向量,并分别初始化嵌入层参数中常用词和术语词的向量表示;最后,利用术语词典对未登录词进行查找替换,缓解了翻译过程中由于术语而产生的严重未登录词问题。将基于注意力机制的神经机器翻译模型作为基线系统进行实验,结果表明:本文模型在电气领域测试语料上的翻译性能提高了2.713个BLEU值点。In view of the scarcity of parallel corpus in English-Chinese machine translation in the field of electrical engineering,an improved embedding layer parameter initialization method fusing domain terminformation was proposed on the basis of using general corpus to train the translation model.Firstly,the termwords were divided into a minimum unit by word segmentation preprocessing for the text.Then,two word vectorstrained by Glove and Word2vec on different monolingual corpus were used to initialize the vector representationof common words and term words in the embedding layer parameters respectively.Finally,the term dictionarywas used to search and replace the out-of-vocabulary words,which alleviates the serious problem of unknownwords caused by terminology in the process of translation.The neural machine translation model based onattention mechanism was used as the baseline system for experiments.The results show that the translationperformance of the proposed model on test corpus in electrical engineering field is improved by 2.713 BLE Upoints.
关 键 词:电气领域 机器翻译 术语信息 嵌入层参数 初始化
分 类 号:TP391.2[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.238.220