EAE:一种酶知识图谱自适应嵌入表示方法  被引量:2

EAE:Enzyme Knowledge Graph Adaptive Embedding

在线阅读下载全文

作  者:杜治娟[1] 张祎 孟小峰[1] 王秋月[1] Du Zhijuan;Zhang Yi;Meng Xiaofeng;Wang Qiuyue(School of Information, Renmin University of China, Beijing 100872)

机构地区:[1]中国人民大学信息学院,北京100872

出  处:《计算机研究与发展》2017年第12期2907-2919,共13页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61379050;61532010;91646203;61532016;61762082);国家重点研发计划项目(2016YFB1000603;2016YFB1000602);2017年度河南省科技开放合作项目(172106000077);北大方正集团有限公司数字出版技术国家重点实验室开放课题~~

摘  要:近年来,构建大规模知识图谱(knowledge graph,KG),并用其解决实际问题已经成为大趋势.KG的嵌入表示方便了机器学习在KG等关系数据上的应用,它可以促进知识分析、推理、融合、补全,甚至决策.最近,开放域知识图谱(open-domain knowledge graph,OKG)的构建和嵌入表示已经得到蓬勃发展,大大促进了开放域中大数据的智能化.与此同时,特定域知识图谱(specific-domain knowledge graph,SKG)也成为了特定领域中智能应用的重要资源.但是,SKG还不发达,其嵌入表示尚处于萌芽阶段.这主要是由于SKG与OKG的数据分布显著不同,更具体地说:1)在OKG中,如WordNet,Freebase,头/尾实体的稀疏度几乎相等;但是在Enzyme,NCI-PID等SKG中不均匀性更受欢迎,例如微生物领域的酶KG中尾实体是头实体的1 000倍.2)头实体和尾实体可以在OKG中交换位置,但是它们在SKG中是非交换的,因为大多数关系是属性.例如实体"奥巴马"可以是头实体也可以是尾实体,但是头实体"酶"总是处于头位置.3)关系的广度在OKG中具有小的偏差,而SKG中很不平衡.例如一个酶实体甚至可以链接31 809个"x-gene"实体.基于这些观察,提出了一个新方法 EAE来处理这3个问题,并在链接预测和元组分类任务上评估了EAE方法.实验结果表明:EAE显著优于Trans(E,H,R,D和TransSparse),达到了最先进的性能.In recent years a drastic rise in constructing Web scale knowledge graph(KG)has appeared and the deal with practical problems falls back on KG.Embedding learning of entities and relations has become a popular method to perform machine learning on relational data such as KG.Based on embedding representation,knowledge analysis,inference,fusion,completion and even decision making could be promoted.Constructing and embedding open domain knowledge graph(OKG)has mushroomed,which greatly promots the intelligentization of big data in open domain.Meanwhile,specific domain knowledge graph(SKG)has become an important resource for smart applications in specific domain.However,SKG is developing and its embedding is still in the embryonic stage.This is mainly because there is a germination in SKG due to the difference for data distributions between OKG and SKG.More specifically:1)In OKG,such as WordNet and Freebase,sparsity of head and tail entities are nearly equal,but in SKG,such as Enzyme KG and NCI PID,inhomogeneous is more popular.For example,the tail entities are about1000times more than head ones in the enzyme KG of microbiology area.2)Head and tail entities can be commuted in OKG,but they are noncommuting in SKG because most of relations are attributes.For example,entity“Obama”can be a head entity or a tail entity,but the head entity“enzyme”is always in the head position in the enzyme KG.3)Breadth of relation has a small skew in OKG while imbalance in SKG.For example,a enzyme entity can link31809x gene entities in the enzyme KG.Based on observation,we propose a novel approach EAE to deal with the3issues.We evaluate our approach on link prediction and triples classification tasks.Experimental results show that our approach outperforms Trans(E,H,R,D and TransSparse)significantly,and achieves state of the art performance.

关 键 词:特定域知识图谱  嵌入表示 不均匀 非交换 不平衡 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象