检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郭志强 关东海 袁伟伟[1] GUO Zhiqiang;GUAN Donghai;YUAN Weiwei(School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
机构地区:[1]南京航空航天大学计算机科学与技术学院,南京211106
出 处:《计算机科学》2024年第8期272-280,共9页Computer Science
基 金:航空基金(ASFC-20200055052005)。
摘 要:中文命名实体识别(CNER)任务是一种自然语言处理技术,旨在识别文本中具有特定类别的实体,如人名、地名、组织机构名等,它是问答系统、机器翻译、信息抽取等自然语言应用的基础底层任务。由于中文不具备类似英文这样的天然分词结构,基于词的NER模型在中文命名实体识别上的效果会因分词错误而显著降低,基于字符的NER模型又忽略了词汇信息的作用,因此,近年来许多研究开始尝试将词汇信息融入字符模型中。WC-LSTM通过在词汇的开始字符和结束字符中注入词汇信息,使模型性能获得了显著的提升。然而,该模型依然没有充分利用词汇信息,因此在其基础上提出了基于字词融合的低词汇信息损失NER模型LLL-WCM,对词汇的所有中间字符融入词汇信息,避免了词汇信息损失。同时,引入了两种编码策略平均(avg)和自注意力机制(self-attention)以提取所有词汇信息。在4个中文数据集上进行实验,结果表明,与WC-LSTM相比,该方法的F1值分别提升了1.89%,0.29%,1.10%和1.54%。Chinese named entity recognition(CNER)task is a natural language processing technique that aims to recognize entities with specific categories in text,such as names of people,places,organizations.It is a fundamental underlying task of natural language applications such as question and answer systems,machine translation,and information extraction.Since Chinese does not have a natural word separation structure like English,the effectiveness of word-based NER models for Chinese named entity recognition is significantly reduced by word separation errors,and character-based NER models ignore the role of lexical information.In recent years,many studies have attempted to incorporate lexical information into character-based models,and WC-LSTM has achieved significant improvements in model performance by injecting lexical information into the start and end characters of a word.However,this model still does not fully utilize lexical information,so based on it,LLL-WCM(word-character model with low lexical information loss)is proposed to incorporate lexical information for all intermediate characters of the lexicon to avoid lexical information loss.Meanwhile,two encoding strategies average and self-attention mechanism are introduced to extract all lexical information.Experiments are conducted on four Chinese datasets,and the results show that the F1 values of this method are improved by 1.89%,0.29%,1.10%and 1.54%,respectively,compared with WC-LSTM.
关 键 词:命名实体识别 自然语言处理 词汇信息损失 中间字符 编码策略
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.158.217