检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张琪 钟昊 ZHANG Qi;ZHONG Hao(School of Information Technology and Engineering,Guangzhou College of Commerce,Guangzhou 511363,China;School of Computer Science,South China Normal University,Guangzhou 510631,China)
机构地区:[1]广州商学院信息技术与工程学院,广州511363 [2]华南师范大学计算机学院,广州510631
出 处:《计算机科学与探索》2024年第7期1806-1813,共8页Journal of Frontiers of Computer Science and Technology
基 金:国家重点研发计划(2023YFC3341200);国家自然科学基金(62377015);华南师范大学青年教师科研培育基金项目(23KJ29)。
摘 要:知识图谱的规模不断增加,使得实体摘要成为了研究的热点问题。实体摘要的目标是从描述实体的大规模三元结构事实中得到实体的简洁描述。研究的目的是基于大语言模型提出一种次模优化方法用于实体摘要的提取。首先,基于三元组中实体、关系和属性的描述信息,采用大语言模型对它们进行嵌入,能够有效地捕捉三元组的语义信息,生成包含丰富语义信息的嵌入向量。其次,基于大语言模型生成的嵌入向量,定义任意两个描述同一实体的三元组事实之间关联度的刻画方法,任意两个三元组之间的关联度越高,表示这两个三元组之间包含的信息越相似。最后,基于上述定义的三元组关联度的刻画方法,定义正规化且单调非减的次模函数,将实体摘要建模为次模函数最大化问题,那么具有性能保证的贪心算法可以直接用于提取实体的摘要。在三个公共基准数据集上进行测试,采用F1值和归一化折损累计增益(NDCG)两个指标对提取的实体摘要的质量进行评估,实验结果表明该方法显著优于当前最先进的方法。The continuous expansion of the knowledge graph has made entity summarization a research hotspot.The goal of entity summarization is to obtain a brief description of an entity from large-scale triple-structured facts that describe it.The research aims to propose a submodular optimization method for entity summarization based on a large language model.Firstly,based on the descriptive information of entities,relationships,and properties in the triples,a large language model is used to embed them to vectors,effectively capturing the semantic information of the triples and generating embedding vectors containing rich semantic information.Secondly,based on the embedding vectors generated by the large language model,a method is defined to characterize the relevance between any two triples that describe the same entity.The higher the relevance between any two triples,the more similar the information contained in these two triples.Finally,based on the defined method for characterizing triple relevance,a normalized and monotonically non-decreasing submodular function is defined,modeling entity summarization as a submodular function maximization problem.Therefore,greedy algorithms with performance guarantees can be directly applied to extracting entity summaries.Testing is conducted on three public benchmark datasets,and the quality of the extracted entity summaries is evaluated using two metrics,F1 score and NDCG(normalized discounted cumulative gain).Experimental results show that the proposed approach significantly outperforms the state-of-the-art method.
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.39