检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:游新冬 问英姿 佘鑫鹏 吕学强[1] YOU Xindong;WEN Yingzi;SHE Xinpeng;LYU Xueqiang(Beijing Key Laboratory of Network Culture and Digital Communication(Beijing Information Science and Technology University),Beijing 100101,China)
机构地区:[1]网络文化与数字传播北京市重点实验室(北京信息科技大学),北京100101
出 处:《计算机应用》2024年第7期2026-2033,共8页journal of Computer Applications
基 金:国家语委项目(ZDI145-10);北京市自然科学基金资助项目(4212020);华能集团总部科技项目(HNKJ21-HF43)。
摘 要:针对机电设备领域相关语料匮乏、关系类型特征挖掘不充分以及文本包含重叠三元组的问题,提出一种融合提示学习与先验知识以迭代式对抗训练的三元组抽取方法TBPA(Triplet extraction Based on Prompt and Antagonistic training)。首先,利用BERT(Bidirectional Encoder Representations from Transformers)模型在自构语料库上进行微调,以获取输入文本的特征向量;接着,采用投影梯度下降(PGD)方法在嵌入层进行迭代式对抗训练,提高模型对干扰样本的抵御能力和对真实样本的泛化能力;然后,利用单层头尾指针网络识别出头实体,并结合提示学习模板获取头实体对应的领域先验特征,将字向量与Prompt模板中预测得到的提示向量相结合;最后,在分层标注框架下,使用单层头尾指针网络逐个识别预定义的所有关系类型所对应的尾实体。与基线模型CasRel相比,TBPA在精确率、召回率和F1值上分别提高了3.10、6.12、4.88个百分点。实验结果表明,TBPA在煤矿机电设备领域三元组抽取任务中具有一定的优势。To address the challenges of scarce domain-specific corpora,insufficient feature mining of relation types,and the presence of overlapping triplets in texts for electromechanical equipment domain,a triplet extraction method TBPA(Triplet extraction Based on Prompt and Antagonistic training)based on prompt learning with prior knowledge through iterative adversarial training was proposed.Firstly,the BERT(Bidirectional Encoder Representations from Transformers)model was fine-tuned on a self-constructed corpus to obtain feature vectors for input text.Then,an iterative adversarial training using the Projection Gradient Descent(PGD)method was conducted at the embedding layer to enhance the model’s resistance to perturbed samples and generalization ability to real samples.Furthermore,a single-layer head-tail pointer network was used to identify the head entity,and domain-specific prior features corresponding to the head entity were obtained by incorporating the word vectors with the prompt vectors predicted by the prompt learning templates.Finally,within a hierarchical annotation framework,another single-layer head-tail pointer network was employed to sequentially identify the tail entities associated with predefined relation types.In comparison with the baseline model CasRel,TBPA achieves improvements of 3.10,6.12 and 4.88 percentage points in precision,recall,and F1 score,respectively.Experimental results demonstrate its advantages in triplet extraction tasks within the domain of mine electromechanical equipment.
关 键 词:煤矿机电设备 三元组抽取 提示学习 迭代式对抗训练 自构语料库
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222