检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄峻福 李天瑞[1] 贾真[1] 景运革[1] 张涛[1]
机构地区:[1]西南交通大学信息科学与技术学院,成都611756
出 处:《计算机应用》2016年第7期1881-1886,1898,共7页journal of Computer Applications
基 金:国家自然科学基金资助项目(61573292;61572407);中央高校基本科研业务费专项(2682015CX070)~~
摘 要:针对传统实体对齐方法在中文异构网络百科实体对齐任务中效果不够显著的问题,提出一种基于实体属性与上下文主题特征相结合的实体对齐方法。首先,基于百度百科及互动百科数据构造中文异构百科知识库,通过统计方法构造资源描述框架模式(RDFS)词表,对实体属性进行规范化;其次,抽取实体上下文信息,对其进行中文分词后,利用主题模型对上下文建模并通过吉布斯采样法求解模型参数,计算出主题-单词概率矩阵,提取特征词集合及对应特征矩阵;然后,利用最长公共子序列(LCS)算法判定实体属性相似度,当相似度位于下界与上界之间时,进一步结合百科类实体上下文主题特征进行判定;最后,依据标准方法构造了一个异构中文百科实体对齐数据集进行仿真实验。实验结果表明,与经典的属性相似度算法、属性加权算法、上下文词频特征模型及主题模型算法进行比较,所提出的实体对齐算法在人物领域和影视领域的准确率、召回率与综合指标F值分别达到97.8%、88.0%、92.6%和98.6%、73.0%、83.9%,比其他方法均有较大的提高。实验结果验证了在构建中文异构百科知识库场景中,所提算法可以有效提升中文百科实体对齐效果,可应用到具有上下文信息的实体对齐任务中。Aiming at the problem that the traditional entity alignment algorithm may lead to bad performance in entity alignment task of Chinese heterogeneous encyclopedia knowledge base, an entity alignment method based on entity attributes and the features of context topics was proposed. First, a Chinese heterogeneous encyclopedia knowledge base was constructed based on Baidu encyclopedia and Hudong encyclopedia data. Next, the Resource Description Framework Schema( RDFS)vocabulary list was made to normalize the entity attributes. Then the entity context information was extracted and the Chinese word segmentation was used on the contexts. The contexts were modelled by using the topic model and the parameters were computed by Gibbs sampling method. After that the topic-word probability matrix, the characteristic word collection and the corresponding feature matrix were calculated. Last, the Longest Common Subsequence( LCS) algorithm was used to compute the entity attribute similarity. When the similarity was between the lower and the upper bounds, the topic features of the entities' context were combined to resolve the entity alignment problem. Finally, according to the standard method, an entity alignment data set of Chinese heterogeneous encyclopedia was constructed for simulation experiments. In comparison with the traditional property similarity algorithm, weighted-property algorithm, context term frequency feature model and topic model algorithm, the experimental results show that the proposed method achieves 97. 8% accuracy, 88. 0% recall, 92. 6% F-score in people class and 98. 6% accuracy, 73. 0% recall, 83. 9% F-score in movie class. It outperformed the other entity alignment algorithms. The experimental results also indicate that the proposed method can improve the entity alignment results in constructing the Chinese heterogeneous encyclopedia knowledge base, and it can be applied to the entity alignment tasks with context information.
关 键 词:知识库 实体对齐 主题模型 资源描述框架模式 最长公共子序列算法
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.127