多特征融合的英文科技文献增量式人名消歧应用研究  被引量:4

Application Research of Incremental Person Name Disambiguation in English Scientific and Technological Literature Based on Multi Feature Fusion

在线阅读下载全文

作  者:阮光册 涂世文 田欣 张莉 Ruan Guangce;Tu Shiwen;Tian Xin;Zhang Li(Department of Information Management in Faculty of Economics and Management,East China Normal University,Shanghai 200241;Shanghai Technology Development Co.,Ltd,Shanghai 200235)

机构地区:[1]华东师范大学经济与管理学部信息管理系,上海200241 [2]上海科技发展有限公司,上海200235

出  处:《情报杂志》2021年第9期147-153,共7页Journal of Intelligence

基  金:上海市经信委项目“上海人工智能公共研发资源图谱”(编码:XX-RGZN-01-19-5037)。

摘  要:[目的/意义]英文作者重名现象十分普遍,为解决科技文献增量式人名消歧问题,以提高学术检索平台作者检索的精度。[方法/过程]提出一种融合文献外部基本特征和内部语义特征的人名消歧方法,解决新增英文学术文献作者归属的问题。首先,提取学术文献中人名消歧所需的元数据字段,采用BERT模型对元数据中包含语义信息的文本内容进行向量表示;随后,将融合多特征的数据输入XGBoost,完成机器学习;最后,用学习好的模型实现新增文献的作者分配。[结果/结论]通过实验对比,该方法表现出较好的效果,F1取得了95.6%的分值。[Purpose/Significance]Since the phenomenon of duplicate names of English authors is very common,in order to solve the problem of incremental name disambiguation in scientific and technological literature,and improve the accuracy of author retrieval in academic retrieval platform.[Method/Process]This paper proposes a method of name disambiguation,which combines the external basic features and internal semantic features of the literature,to solve the problem of the author's attribution of the newly added English academic literature.Firstly,this paper extracts the metadata fields needed for person name disambiguation in academic literature,and uses the Bert model to represent the text content containing semantic information in the metadata vector;then,the data fused with multiple features is input into XGBoost to complete machine learning;finally,the author assignment of new literature is realized by using the learned model.[Result/Conclusion]Through the experimental comparison,this method shows good results,F1 achieved 95.6%of the score.

关 键 词:人名消歧 科技文献 多特征融合 BERT XGBoost 

分 类 号:G250[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象