检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:潘建[1,2] 吴志伟 李燕君 PAN Jian;WU Zhiwei;LI Yanjun(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China;Zhijiang College of Zhejiang University of Technology,Shaoxing,Zhejiang 312030,China)
机构地区:[1]浙江工业大学计算机科学与技术学院,杭州310023 [2]浙江工业大学之江学院,浙江绍兴312030
出 处:《计算机科学》2025年第4期262-270,共9页Computer Science
基 金:浙江省自然科学基金探索项目(LGF20F020015)。
摘 要:目前,在实体链接任务的研究中,对中文实体链接、新兴实体与不知名实体链接的研究较少。此外,传统的BERT模型忽略了中文的两个关键方面,即字形和部首,这两者为语言理解提供了重要的语法和语义信息。针对以上问题,提出了一种基于中文特征的零样本实体链接模型CGR-BERT-ZESHEL。该模型首先通过引入视觉图像嵌入和传统字符嵌入,分别将字形特征和部首特征输入模型,从而增强词向量特征并缓解未登录词对模型性能的影响;然后采用候选实体生成和候选实体排序两阶段的方法得到实体链接的结果。在Hansel和CLEEK两个数据集上进行实验,结果表明,与基线模型相比,CGR-BERT-ZESHEL模型在候选实体生成阶段的性能指标Recall@100提高了17.49%和7.34%,在候选实体排序阶段的性能指标Accuracy提高了3.02%和3.11%;同时,在Recall@100和Accuracy指标上的性能均优于其他对比模型。Currently,the research on entity linking tasks is less on Chinese entity links,emerging entities and unknown entity links.Additionally,traditional BERT models ignore two crucial aspects of Chinese,namely glyphs and radicals,which provide important syntactic and semantic information for language understanding.To solve the above problems,this paper proposes a zero-shot entity linking model based on Chinese features called CGR-BERT-ZESHEL.Firstly,the model incorporates glyph and radical features by introducing visual image embedding and traditional character embedding,respectively,to enhance word vector features and mitigate the effect of out-of-vocabulary words.Then,a two-stage method of candidate entity generation and candidate entity ranking is used to obtain the results.Experimental results on the two datasets which include Hansel and CLEEK show that compared with the baseline model,the performance metric Recall@100 is improved by 17.49%and 7.34%in the candidate entity generation stage,and the performance metric accuracy is improved by 3.02%and 3.11%in the candidate entity ranking stage.Meanwhile,the proposed model also outperforms other baseline models in both Recall@100 and Accuracy metric.
关 键 词:实体链接 中文零样本 BERT 候选实体生成 候选实体排序
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171