检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李育贤 吕学强[1] 游新冬 LI Yuxian;Lü Xueqiang;YOU Xindong(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 100101,China)
机构地区:[1]北京信息科技大学网络文化与数字传播北京市重点实验室,北京100101
出 处:《北京信息科技大学学报(自然科学版)》2022年第3期74-81,共8页Journal of Beijing Information Science and Technology University
基 金:国家自然科学基金资助项目(62171043);国家语委科研项目(ZDI145-10;YB145-3);北京市自然科学基金资助项目(4212020)。
摘 要:武器装备领域中包含一定数量的长术语,目前主流的术语抽取模型无法对其很好地识别。针对这一问题,提出使用融合字形信息的头尾指针网络模型来抽取该领域术语。首先使用基于转换器的双向编码器表示(bidirectional encoder representation from transformers,BERT)预训练模型得到字符的向量表示,拼接字符的五笔编码,从字形信息的角度增强模型的字符表示;其次使用头尾指针网络直接对术语边界进行解码,以更好地识别长术语;最后使用Focal Loss作为损失函数,缓解由于术语占总词汇比例不高以及使用头尾指针网络作为解码器带来的标签不平衡问题。实验证明,所提模型在武器装备领域术语抽取中F;值为91.25%,抽取效果较主流模型有所提升。There are a certain number of long terms in the weaponry field, which can not be identified well by the current mainstream term extraction models.Aiming at this problem, a head-tail pointer network model based on the fusion of glyph information was proposed to extract terminology in this field.First, the bidrectional encoder representation from transformers(BERT) pre-training model was used to obtain the vector representation of the characters, and the Wubi encoding of the characters was spliced to enhance the character representation of the model from the perspective of glyph information.Then, the term boundaries were directly decoded using the head-tail pointer network to better identify long terms.Last, Focal Loss was used as the loss function to alleviate the label imbalance problem caused by the low proportion of terms in the total vocabulary and the use of head and tail pointer networks as decoders.Experiments show that the F;value of the proposed model is 91.25% in the term extraction of weaponry field, and the extraction effect is improved compared with the mainstream models.
关 键 词:武器装备领域 术语抽取 BERT 五笔编码 头尾指针网络 Focal Loss
分 类 号:TP241.2[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249