检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高劲松[1] 周习曼[1] 梁艳琪[1] GAO Jinsong ZHOU Ximan LIANG Yanqi
机构地区:[1]华中师范大学信息管理学院,教授博士生导师湖北武汉430079
出 处:《中国图书馆学报》2016年第6期85-101,共17页Journal of Library Science in China
基 金:国家社会科学基金一般项目"基于关联数据的知识创造中知识外化和融合机制研究"(编号:12BTQ039)的研究成果之一~~
摘 要:随着关联数据应用的不断深入,已有众多的数据集发布在网上,但目前已发布的关联数据集之间关联很少,为数据的共享使用带来不便。本研究提出一种基于统计学习方法进行关联数据集间实体识别及链接构建的方法。首先进行数据集间的实体匹配,采用基于K中心点聚类算法实现属性的聚合及关系发现,对具有高相关度的属性进行匹配关系描述,降低实体匹配时的属性匹配计算次数;其次对已匹配的属性进行实体属性值的相似度比较计算,实现实体间相似度的比较,在SILK框架下实现实体的链接构建工作,以达到实体链接发现的目的;最后通过实验验证,这一方法能降低数据集间实体匹配计算次数,提高实体链接的正确率,具有可行性及实用性。The World Wide Web has been developed into a global data space, which links web data and database data. Linked data is one of the best tools to achieve this information evolution. Linked data publish data in a structured form to interlink resources. With the depth of linked data being deeply applied, more and more data are published on the web as linked data. The published web information also has been transformed into linked data in automatic or semi-automatic ways. Practically, there are still only a few connections between the released linked dataset, and it is inconvenient to share data. So based on the entity linking discovery, we can discover the real relation between entities, build the entity linking according to the publishing standard, realize the goal of discovering potential entity linking, enhance the interlinking between datasets, and then increase the accuracy of published linked data. In this thesis, a statistical learning method is proposed to recognize entities and build links across different linked datasets. Before the entities comparing computation, first, the method finds class correspondences to classify related entity attributes correspondences across datasets. It gives a matching relationship description for the high correlation attributes and reduces the calculation times to match entity attributes. Second, our method compares the similarity of entities based on calculating the similarity of the matched attributes, and builds entities' linking to complete the goal of linking discovery across different datasets. When to cluster the attributes correspondences, we use K-medoids clustering algorithm to discover the potential attributes correspondences. K-medoids clustering algorithm is mainly aimed at classifying property concepts and corresponding attributes that represent the same expression meanings between datasets. At last, the attributes can be compared and matched in groups. Then EDOAL language is used to define the clustered attributes and describe the correspondences relation bet
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249