基于合作作者与隶属机构信息的同名排歧方法  被引量:6

Co-author and Affiliate Based Name Disambiguation Approach

在线阅读下载全文

作  者:尚玉玲 曹建军 李红梅[1] 郑奇斌[1] SHANG Yu-ling;CAO Jian-jun;LI Hong-mei;ZHENG Qi-bin(College of Command Information Systems,PLA University of Science and Technology,Nanjing 210007,China;The 63rd Research Institute,National University of Defense Technology,Nanjing 210007,China)

机构地区:[1]解放军理工大学指挥信息系统学院,南京210007 [2]国防科技大学第六十三研究所,南京210007

出  处:《计算机科学》2018年第11期220-225,260,共7页Computer Science

基  金:国家自然科学基金(61371196);中国博士后科学基金(2015M582832)资助

摘  要:同名排歧是实体分辨领域的重要研究内容之一,其旨在分辨出相同姓名对应的不同人。针对传统同名排歧方法需要丰富的信息以及无法解决信息缺乏时的排歧问题,提出了一种基于合作作者和隶属机构信息的同名排歧方法。根据作者间的合作关系以及作者与机构间的隶属关系构造实体关系图,采用广度优先搜索策略搜索图中两两同名作者间的有效路径;根据有效路径长度、数目及路径上边的类型,计算两个同名作者间的连接强度,并将其与阈值进行比较,实现同名排歧。实验结果表明,所提方法比当前最好的方法具有更好的同名排歧效果,且能够实现单一作者的同名排歧。Name disambiguation is one of the most challenging issues in entity resolution domain,and it aims at solving the problem that the same name is shared by different people.However,most of the conventional approaches rely heavily on sufficient information of entities,and fail to realize the name identification with insufficient information.This paper proposesd a novel name disambiguation approach based on co-authors and authors’affiliates.Specifically,entity relationship diagram is constructed based on co-authorship and authors’affiliates,and the breadth-first search scheme is utilized to search the effective path between each pair of authors with the exactly same name in the constructed entity relationship diagram.A unique metric connection strength between authors is calculated according to the length of effective path,the number of effective path and the type of edge on path.And it is compared with the threshold to achieve name disambiguation.Experimental results show that the proposed approach is better than the state-of-the-art approaches,and it is able to disambiguate the authors sharing the same name without co-authorship.

关 键 词:数据质量 实体分辨 同名排歧 有效路径 连接强度 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象