基于语义和结构置信度的知识图谱质量校验方法  被引量:2

Quality Verification Method for Knowledge Graph Based on Semantic and Structural Trustworthiness

在线阅读下载全文

作  者:叶琪 张一乾 阮彤[1] 杜渂 YE Qi;ZHANG Yiqian;RUAN Tong;DU Wen(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China;DS Information Technology Co.,Ltd.,Shanghai 200032,China)

机构地区:[1]华东理工大学信息科学与工程学院,上海200237 [2]迪爱斯信息技术股份有限公司,上海200032

出  处:《计算机工程》2023年第5期48-55,共8页Computer Engineering

基  金:国家重点研发计划(2021YFC2701800,2021YFC2701801)。

摘  要:知识图谱因其较强的表达能力和可解释性而被广泛应用于问答系统、信息检索等人工智能任务中,然而,在实际应用场景中大量使用自动化知识图谱构建技术会不可避免地引入噪声和冲突,从而对知识图谱下游应用的性能产生严重影响。为从知识图谱中检测出潜在的噪声、保存真实可信的三元组并为下游应用任务提供高质量的知识,提出一种基于语义与结构双重置信度的三元组评估模型。该模型由语义真实性评估器与结构真实性评估器构成,前者通过特定规则将三元组转换为句子序列,基于双向编码器表示变换模型度量语义真实性,后者通过表示学习模型获取实体及关系的向量表示,在知识表示、路径特征两个层面上度量结构真实性。在4个真实图谱数据集上的实验结果表明,所提模型的准确率、精确率、召回率、F1值等评估指标相较TransE-RFC、TransE-KNC、TransEXGB等模型提升3%~4%,其能够有效检测带噪声图谱数据集中的噪声错误同时最大程度地保留真实可信的知识。Knowledge graphs are widely used in artificial intelligence tasks such as question-answering systems and information retrieval owing to their strong expressive ability and interpretability.The extensive utilization of automated knowledge graph construction technology in practical scenarios introduces noise and conflict and has a serious impact on the performance of downstream applications of knowledge graphs.A triplet evaluation model is proposed based on semantic and structural double trustworthiness to detect potential noise from knowledge graph,preserve the authentic and credible triples,and provide high-quality knowledge for downstream application tasks.The model comprises semantic and structural authenticity evaluators.The former converts triples into sentence sequences through specific rules and measures semantic authenticity based on the Bidirectional Encoder Representations from Transformers(BERT)model.The latter measures structural authenticity at two levels,namely,knowledge representation and path feature,by vector representation of entities and relationships obtained from the learning model.Results from analysis of four real graph datasets indicate that the accuracy,precision,recall,F1 value,and other evaluation indicators of the proposed model are 3%to 4%higher than those of TransERandom Forest Classifier(TransE-RFC),TransE-K Nearest Neighbor Classifier(TransE-KNC),TransE-eXtreme Gradient Boosting(TransE-XGB),and other models.The proposed model can effectively detect noise errors in graph datasets while preserving authentic and credible knowledge to the maximum extent.

关 键 词:知识图谱 质量校验 三元组置信度评估 语义真实性 结构置信度 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象