机构地区:[1]电子科技大学信息与软件工程学院,成都610054
出 处:《计算机学报》2017年第6期1275-1290,共16页Chinese Journal of Computers
基 金:国家自然科学基金重点项目(61133016;U1401257);国家自然科学基金青年项目(61502087);四川省高新技术及产业化面上项目(2017GZ0308)~~
摘 要:关系推理是知识库构建的关键技术之一,典型应用场景包括关系预测和实体链接等.关系推理研究的问题是如何利用知识库中已有的知识推理得到新的知识.当前主流知识库采用的推理模型包括潜在因子模型和随机游走模型.前者将实体和关系映射到一个低维实数向量空间,通过向量相似度计算实现推理.后者基于一阶谓词逻辑进行实体间的关系推理,通过随机算法降低算法复杂度.比较而言,前者由于需要进行大规模矩阵运算而计算复杂度较高,后者则因为采用了随机采样方法,难以完全利用知识库中已有的结构化信息,而导致召回率较低.通过研究现有随机游走模型基本假设存在的问题,提出了两项新的推理建模假设.首先,以PRA为代表的随机游走模型采用关系单向性假设,将知识库中的实体关系三元组视为一阶Horn子句,将关系处理为主语和宾语间的偏序关系,该文提出的假设是,尽管实体间的关系从字面和句法上具有方向性,但关系所包含的信息对两侧实体而言具有语义上的双向性,允许关系推理算法利用从宾语到主语的逆向关系语义进行知识推理;其次,PRA算法采用一阶谓词逻辑进行推理,并通过引入一个随机采样机制来避免穷举搜索和提高计算速度,该文认为这是导致PRA算法及类似算法无法完全利用知识库中已有信息的一个主要原因,据此提出了一个新的假设,即知识库中特定关系子网的拓扑结构所包含的信息可以被利用来改善随机游走模型的关系推理结果,为验证上述假设的有效性,提出了一种基于双层随机游走策略的关系推理新算法,在WN18、FB15K和FB40K等公开数据集上的实验结果表明,该算法能够有效地提高基于随机游走的关系推理模型的准确性和召回率,性能显著优于当前主流的基于潜在因子模型的关系推理算法.Relational inference is one of the crucial techniques for knowledge base population tasks, typical application scenarios include relationship prediction and entity linking. The challenging problem of relational inference is how to infer new relations between entities from the facts existed in the knowledge bases. The reasoning models adopted in current mainstream knowledge bases can be divided into two categories: the latent factor models and the random walk models. The latent factor models realize the reasoning by mapping the entities and relations into a low dimensional real-valued vector space, and then computing with corresponding vector similarity measures. The random walk models, however, are based on the first-order predicate logic to deduce the reasoning between the entities and reduce the algorithm complexity through stochastic algorithm. In comparison, the efficiency of the latent factor models usually suffer from their computational complexity caused by large-scale matrix computation operations. While the random walk models usually suffer from their low recall rates, due to the fact that it is difficult to fully utilize all of the available structure information provided by the knowledge bases with any random sampling design. This work studied the potential problems of the basic assumptions adopted by the existing random walk models, and proposed two new inference modeling assumptions thereby. Firstly, the random walk models represented by the Path Ranking Algorithm (PRA) adopt the unidimensionality assumption of the relationships in between the entities. In typical random walk models, the entity-relation-entity tuples that existed in the knowledge base are regarded as first- order Horn clauses, in which the relationships are treated as partial ordering relations between the subjects and the objects. Our hypothesis is that although the relation between two entities is literally, syntactically directional, the information conveyed by this relation is equally shared between the connected entities o
关 键 词:关系推理 统计关系学习 知识库扩容 随机游走 路径排序算法 人工智能
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...