检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐关友 冯伟森[1] XU Guanyou;FENG Weisen(College of Computer Science,Sichuan University,Chengdu Sichuan 610065,China)
出 处:《计算机应用》2022年第9期2693-2700,共8页journal of Computer Applications
摘 要:最近一些基于字符的命名实体识别(NER)模型无法充分利用词信息,而利用词信息的格子结构模型可能会退化为基于词的模型而出现分词错误。针对这些问题提出了一种基于transformer的python NER模型来编码字符-词信息。首先,将词信息与词开始或结束对应的字符绑定;然后,利用三种不同的策略,将词信息通过transformer编码为固定大小的表示;最后,使用条件随机场(CRF)解码,从而避免获取词边界信息带来的分词错误,并提升批量训练速度。在python数据集上的实验结果可以看出,所提模型的F1值比Lattice-LSTM模型高2.64个百分点,同时训练时间是对比模型的1/4左右,说明所提模型能够防止模型退化,提升批量训练速度,更好地识别python命名实体。Recently,some character-based Named Entity Recognition(NER)models cannot make full use of word information,and the lattice structure model using word information may degenerate into a word-based model and cause word segmentation errors. To deal with these problems,a python NER model based on transformer was proposed to encode character-word information. Firstly,the word information was bound to the characters corresponding to the beginning or end of the word. Then,three different strategies were used to encode the word information into a fixed-size representation through the transformer. Finally,Conditional Random Field(CRF)was used for decoding,thereby avoiding the problem of word segmentation errors caused by obtaining the word boundary information as well as improving the batch training speed.Experimental results on the python dataset show that the F1 score of the proposed model is 2. 64 percentage points higher than that of the Lattice-LSTM model,and the training time of the proposed model is about a quarter of the comparison model,indicating that the proposed model can prevent model degradation,improve batch training speed,and better recognize the python named entities.
关 键 词:命名实体识别 词边界 PYTHON 词信息 TRANSFORMER
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145