基于类中心向量的论文作者归属机构自动识别方法研究被引量：5

Auto-Identification of Authors Affiliation Based on Class-Center Vectors

作　　者：何涛[1] 王桂芳[1] 马廷灿[1] He Tao;Wang Guifang;Ma Tingcan(Wuhan Documentation and Information Center, Chinese Academy of Sciences, Wuhan 430071)

机构地区：[1]中国科学院武汉文献情报中心

出　　处：《情报学报》2019年第7期716-721,共6页Journal of the China Society for Scientific and Technical Information

基　　金：中国科学院青年创新促进会项目(2016160)

摘　　要：对大规模科技文献进行整理分析时,常常需要自动识别论文作者所归属的组织机构,此时需要将论文中的作者地址信息与对应的机构名称进行自动匹配。同一个机构的作者地址信息在不同的英文论文中可能出现多种不同的写法,这给匹配造成了困难。针对这一问题,设计出一种机器学习方法,此方法充分利用英文论文中作者地址的书写特点,在基于类中心向量的基础上将作者地址信息与机构名称进行自动匹配。与传统方法比较,该方法不需要手工编写烦琐的匹配规则,被应用于中国科学院作者地址信息数据集,实验结果证明了此方法的可行性。When analyzing a large amount of scientific and technical literature, identification of the author's affiliation is always necessary. A key step in this task is matching the author 's address to the corresponding institution. Authors from one institution often state their affiliations in various forms in English. This causes string-matching methods to yield unsatisfactory results. In this paper, a machine learning method known as“class-center vectors”has been proposed to solve this problem according to the characteristics of the author's address. Compared with traditional methods, our method does not require matching rules to be written manually. The experimental results of Chinese Academy of Sciences (CAS) author's address data sets illustrate the feasibility of our method.

关键词：作者地址机构名称类中心向量机器学习

分类号：TP181[自动化与计算机技术—控制理论与控制工程] G254[自动化与计算机技术—控制科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于类中心向量的论文作者归属机构自动识别方法研究被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于类中心向量的论文作者归属机构自动识别方法研究 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于类中心向量的论文作者归属机构自动识别方法研究被引量：5