基于多侧面信息表征联合的实体相似性度量及对齐方法  

Entity Similarity Metrics and Alignment Method Based on the Union of Multi-Side Information Representations

作  者:朱红[1] 王阔然 朱彤[2] ZHU Hong;WANG Kuoran;ZHU Tong(School of Artificial Intelligence,China University of Mining and Technology(Beijing),Beijing 100083,China;Archives,China University of Mining and Technology(Beijing),Beijing 100083,China)

机构地区:[1]中国矿业大学(北京)人工智能学院,北京100083 [2]中国矿业大学(北京)档案馆,北京100083

出  处:《计算机工程》2025年第3期64-75,共12页Computer Engineering

基  金:2022年度北京市档案局科研项目。

摘  要:实体对齐旨在发现不同知识图谱中相同对象的不同实例,但图谱之间的异构性导致等价实例结构及表征不一致,从而影响实体对齐准确性。提出一种实体主信息与多侧面信息表征相联合的异构图谱实体相似性度量方法,并用于实体对齐任务。实体主信息包括实体名称及描述,侧面信息包括实体属性、关系及关联实体描述等信息。针对图谱间等价实体结构异构带来的对齐干扰,提出了一种结合实体多侧面信息语义表征的相似性度量方法UnMuSIR-SM&EA用于实体对齐。为提升信息同义词的表示一致性,引入表示学习模型以获取实体各信息的语义表征,为解决表示学习模型嵌入空间各向异性带来的同义词度量尺度不一致问题,设计了一种基于实体主信息对比学习的微调方法,优化实体信息的语义表征。实验结果表明,该方法在结构差异较大的数据集DIS_(ZH-EN)上的Hits@1达到了95.2%,比基于侧面信息的模型BERT-INT高出了16.8百分点;在DBP15K的DBP15K_(ZH-EN)、DBP15K_(JA-EN)和DBP15K_(FR-EN)数据子集上的Hits@1分别达到了95.7%、96.0%和98.9%;在DBP-WD数据集上的Hits@1达到了99.4%。所提模型在实体对齐任务上具有优异的效果。Entity alignment aims to identify different instances of the same object in different knowledge graphs.However,the heterogeneity between graphs leads to inconsistent equivalent instance structures and representations,thereby affecting the accuracy of entity alignment.A heterogeneous graph entity similarity measurement method that combines the main entity information and multi-side information representation is proposed and applied to entity alignment tasks.The main entity information includes the entity name and description,while the peripheral information includes entity attributes,relationships,and related entity descriptions.A similarity metric method called UnMuSIR-SM&EA,which combines the semantic representations of the multi-side information of entities,is proposed for entity alignment to address the alignment interference caused by the heterogeneity of equivalent entity structures between graphs.A representation learning model is introduced to obtain the semantic representations of various types of entity information,to improve synonym representation consistency.To solve the problem of inconsistent synonym measurement scales caused by spatial anisotropy in the embeddings of representation learning models,a fine-tuning method is designed based on entity main information contrastive learning to optimize the semantic representation of entity information.Experimental results show that the proposed method performs exceptionally well on the DIS_(ZH-EN)dataset with significant structural differences with Hits@1 reaching 95.2%,which is 16.8 percentage points higher than the BERT-INT model based on peripheral information.On the DBP15K_(ZH-EN),DBP15K_(JA-EN),and DBP15K_(FR-EN)subsets of DBP15K data Hits@1 is 95.7%,96.0%,and 98.9%,respectively.On the DBP-WD dataset Hits@1 is 99.4%.The proposed model exhibites excellent performance in entity alignment tasks.

关 键 词:实体对齐 知识图谱 相似性度量 对比学习 预训练模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象