基于word2vec模型的专业通用词提取算法及应用举例  

An Algorithm of Extraction of General Words for Specific Purposes Based on word2vec and Its Application

在线阅读下载全文

作  者:田艳[1] 王天奇 TIAN Yan;WANG Tian-qi(School of Foreign Language,Shanghai JiaoTong Uinversity,Shanghai 200240,China)

机构地区:[1]上海交通大学外国语学院,上海200240

出  处:《沧州师范学院学报》2018年第3期68-72,共5页Journal of Cangzhou Normal University

基  金:国家社科基金项目"动态翻译学习的在线系统构建及其评估研究";编号:No.16BYY081;教育部人文社会科学研究一般项目"基于语料库的马克思<资本论>汉译研究";编号:No.15YJA740009

摘  要:专业通用词是某一专业领域中所使用的通用词汇,在翻译过程中往往较难把握。目前,专业通用词主要依靠人工提取,这对分析人员的语言素养及其对语料的熟悉程度提出了较高要求,同时存在提取效率问题。基于Google发布的神经网络机器学习算法模型word2vec,提出一套专业通用词的自动提取算法,并通过Python 2.7编写的脚本实现。以国际财务报告准则语料库为例,对该算法的应用加以说明。General words for specific purposes (GWSP) are defined as the general words used in a specific field, which are difficult to translate. At present, the common way to extract such words relies greatly on manual work, for which reason, better language proficiency of the translator and his/her familiarity with the text are required. Meanwhile, the efficiency of extraction is relatively low. This paper introduces an algorithm that can extract GWSP automatically. This algorithm is based on word2vec, a neural network learning algorithm published by Google, and can be done by scripts programmed in Python 2.7. The application of this algorithm is demonstrated in the International Financial Reporting Standards corpus.

关 键 词:word2vec 专业通用词提取 语料库翻译 

分 类 号:H315.9[语言文字—英语]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象