面向关联数据的实体链接发现方法研究  被引量:7

Linked Data-Oriented Method of Entity Linking Discovery

在线阅读下载全文

作  者:高劲松[1] 周习曼[1] 梁艳琪[1] GAO Jinsong ZHOU Ximan LIANG Yanqi

机构地区:[1]华中师范大学信息管理学院,教授博士生导师湖北武汉430079

出  处:《中国图书馆学报》2016年第6期85-101,共17页Journal of Library Science in China

基  金:国家社会科学基金一般项目"基于关联数据的知识创造中知识外化和融合机制研究"(编号:12BTQ039)的研究成果之一~~

摘  要:随着关联数据应用的不断深入,已有众多的数据集发布在网上,但目前已发布的关联数据集之间关联很少,为数据的共享使用带来不便。本研究提出一种基于统计学习方法进行关联数据集间实体识别及链接构建的方法。首先进行数据集间的实体匹配,采用基于K中心点聚类算法实现属性的聚合及关系发现,对具有高相关度的属性进行匹配关系描述,降低实体匹配时的属性匹配计算次数;其次对已匹配的属性进行实体属性值的相似度比较计算,实现实体间相似度的比较,在SILK框架下实现实体的链接构建工作,以达到实体链接发现的目的;最后通过实验验证,这一方法能降低数据集间实体匹配计算次数,提高实体链接的正确率,具有可行性及实用性。The World Wide Web has been developed into a global data space, which links web data and database data. Linked data is one of the best tools to achieve this information evolution. Linked data publish data in a structured form to interlink resources. With the depth of linked data being deeply applied, more and more data are published on the web as linked data. The published web information also has been transformed into linked data in automatic or semi-automatic ways. Practically, there are still only a few connections between the released linked dataset, and it is inconvenient to share data. So based on the entity linking discovery, we can discover the real relation between entities, build the entity linking according to the publishing standard, realize the goal of discovering potential entity linking, enhance the interlinking between datasets, and then increase the accuracy of published linked data. In this thesis, a statistical learning method is proposed to recognize entities and build links across different linked datasets. Before the entities comparing computation, first, the method finds class correspondences to classify related entity attributes correspondences across datasets. It gives a matching relationship description for the high correlation attributes and reduces the calculation times to match entity attributes. Second, our method compares the similarity of entities based on calculating the similarity of the matched attributes, and builds entities' linking to complete the goal of linking discovery across different datasets. When to cluster the attributes correspondences, we use K-medoids clustering algorithm to discover the potential attributes correspondences. K-medoids clustering algorithm is mainly aimed at classifying property concepts and corresponding attributes that represent the same expression meanings between datasets. At last, the attributes can be compared and matched in groups. Then EDOAL language is used to define the clustered attributes and describe the correspondences relation bet

关 键 词:关联数据 实体链接 数据链接 链接发现 

分 类 号:G254[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象