计算机算法类资料的中英文智能翻译  被引量:2

English-Chinese Intelligent Translation of Computer Algorithm Corpus

在线阅读下载全文

作  者:陈家乐 张艳玲[1] CHEN Jia-le;ZHANG Yan-ling(Faculty of Computer Science and Network Engineering,Guangzhou University,Guangzhou 510006,China)

机构地区:[1]广州大学计算机科学与网络工程学院,广东广州510006

出  处:《计算机技术与发展》2021年第7期176-181,共6页Computer Technology and Development

基  金:2018年教育部第二批产学合作协同育人项目(201802093015)。

摘  要:当前互联网免费可用的在线翻译系统均是使用通用语料训练出来的神经机器翻译模型,在通用语义环境下翻译出色,而在特定的垂直领域(如计算机专业领域)中,由于训练文本和模型训练算法缺乏针对性,导致翻译结果出现专业词汇错漏,文本晦涩难懂。因此,实现特定垂直领域的自动化机器翻译的需求越来越大。通过网络爬虫获取计算机算法类相关的英汉双语例句,基于Word2Vec算法生成含有上下文信息的词向量,将词向量嵌入到Google开源GNMT模型训练英汉翻译模型,基于训练模型实现简易翻译软件。通过对照实验,探究Word2Vec算法中词向量长度对计算词汇间文本相似度的影响和对GNMT训练效果的影响,以及GNMT超参数中的隐藏层单元数num_unit、批尺寸batch_size对训练效果的影响。综合实验结果训练最佳的英汉翻译模型。At present,the free and available online translation systems on the Internet are all neural machine translation models trained by general corpus,which are excellent in the general semantic environment.However,in the specific vertical field(such as computer professional field),due to lack of pertinence of training text and model training algorithm,the translation results appear professional vocabulary errors and omissions,and the text is obscure.Therefore,the demand to achieve an automated machine translation in a specific field becomes bigger and bigger.The English-Chinese bilingual example sentences related to the computer algorithm are obtained by web crawler,and the word vector with context information based on Word2Vec algorithm is generated and embedded into Google open-source GNMT model to train English-Chinese translation model.On the basis,a simple translation software is implemented.Through a comparative experiment,we explore the influence of word vector length on the calculation of text similarity between words and the training effect of GNMT in Word2Vec algorithm,as well as the influence of the number of hidden layer units and batch size in GNMT super parameters on the training effect,training the best English-Chinese translation model based on the experimental results.

关 键 词:机器翻译 Word2Vec算法 词向量 文本相似度 GNMT 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象