检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙丽郡 徐行健 孟繁军 SUN Lijun;XU Xingjian;MENG Fanjun(College of Computer Science and Technology,Inner Mongolia Normal University,Hohhot 010022,Inner Mongolia,China)
机构地区:[1]内蒙古师范大学计算机科学技术学院,内蒙古自治区呼和浩特010022
出 处:《应用科学学报》2025年第2期334-347,共14页Journal of Applied Sciences
基 金:内蒙古自治区自然科学基金(No.2023LHMSS06011,No.2023MS06016);内蒙古师范大学大学生创新创业训练计划(No.202310153007)资助。
摘 要:实体关系联合抽取作为构建知识图谱的核心环节,旨在从非结构化文本中提取实体-关系三元组。针对现有联合抽取方法在解码时未能有效处理实体关系间的相互作用,导致对语境理解不足,产生冗余信息等问题,提出一种基于并行解码和聚类的实体关系联合抽取模型。首先,利用BERT(bidirectional encoder representations from transformers)模型进行文本编码,获取语义信息丰富的字符向量。其次,采用非自回归并行解码器增强实体关系间的交互,并引入层次凝聚聚类算法及多数投票机制进一步优化解码结果以捕获语境信息,减少冗余信息。最后,生成高质量的三元组集合,以构建课程知识图谱。为评估该方法的性能,在公共数据集NYT和WebNLG以及自建C语言数据集上进行实验,结果表明,该方法在精确率和F1值上优于其他对比模型。Entity-relation joint extraction,as a core part of knowledge graph construction,aims to extract entity-relation triples from unstructured text.Current joint extraction methods often struggle with decoding inefficiencies,resulting in weak interaction modeling between entities and relations,insufficient context understanding,and redundant information.To address these limitations,we propose a model based on parallel decoding and clustering for entity-relation joint extraction.First,the bidirectional encoder representations from transformers(BERT)model is used for text encoding to obtain character vectors rich in semantic information.Next,a non-autoregressive parallel decoder is employed to enhance interactions between entities and relations.To further optimize decoding results,hierarchical agglomerative clustering is combined with a majority voting mechanism,improving contextual information capture and reducing redundancy.Finally,a high-quality set of triples is generated to construct a curriculum knowledge graph.To evaluate the performance of the proposed method,experiments are conducted on the public datasets NYT and WebNLG,as well as a self-constructed C language dataset.The results show that this method outperforms other models in terms of precision and F1 score.
关 键 词:联合抽取 并行解码 层次凝聚聚类 多数投票机制 课程知识图谱
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33