检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨长沛 廖列法[1,2] YANG Changpei;LIAO Liefa(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,Jiangxi,China;School of Software Engineering,Jiangxi University of Science and Technology,Nanchang 330000,China)
机构地区:[1]江西理工大学信息工程学院,江西赣州341000 [2]江西理工大学软件工程学院,南昌330000
出 处:《计算机工程》2023年第8期85-95,共11页Computer Engineering
基 金:国家自然科学基金(71462018,71761018)。
摘 要:在中文命名实体识别任务中,具有循环结构的长短时记忆网络模型通过捕捉时序特征解决长距离依赖问题,但其特征捕捉方式单一,信息获取能力有限。卷积神经网络通过使用多层卷积并行处理文本,能够提高模型运算速度,捕捉文本的空间特征,但简单地堆叠多个卷积层容易导致梯度消失。为同时获得多维度的文本特征且改善梯度消失问题,提出一种基于RoBERTa-wwm-DGCNN-BiLSTM-BMHA-CRF的中文命名实体识别模型,通过基于全词遮蔽技术的预训练语言模型RoBERTa-wwm把文本表征为字符级嵌入向量,捕捉深度上下文语义信息,并采用门控机制和残差结构对空洞卷积神经网络进行改进以降低梯度消失的风险。使用双向长短时记忆网络和门控空洞卷积神经网络分别捕捉文本的时序特征和空间特征,采用双线性多头注意力机制对多维度的文本特征进行动态融合,最后使用条件随机场对结果进行约束,获得最佳标记序列。实验结果表明,所提模型在Resume、Weibo和MSRA数据集上的F1值分别为97.20%、74.28%和95.74%,证明了该模型在中文命名实体识别中的有效性。In the task of Chinese Named Entity Recognition(NER),the long short-term memory network model with cyclic structure can solve the problem of long-distance dependence by capturing temporal features,but its feature capture method is singular and the information acquisition ability is limited.By using multi-layer convolution to process text in parallel,the Convolutional Neural Network(CNN)can improve the operation speed of the model and capture the spatial features of text.However,simply stacking multiple convolutional layers can easily lead to the gradient vanishing problem.To obtain multi-dimensional text features simultaneously and improve the gradient vanishing problem,this paper proposes a Chinese NER model based on RoBERTa-wwm-DGCNN-BiLSTM-BMHA-CRF.Firstly,text is represented as a character-level embedding vector by the pre-trained language model RoBERTa-wwm based on the whole-word masking technique to capture the deep contextual semantic information.Secondly,the gating mechanism and residual structure are used to improve the Dilated CNN(DCNN)to reduce the risk of gradient disappearance,and then the Bi-directional Long Short-Term Memory(BiLSTM)network and Dilated Gated CNN(DGCNN)are used to capture the temporal and spatial characteristics of the text,respectively.Thirdly,the Bi-linear Multi-Head Attention(BMHA)mechanism is used to dynamically fuse the multi-dimensional text features.Finally,the Conditional Random Field(CRF)is used to constrain the results and obtain the best marker sequence.The experimental results indicate that the F1 values of the proposed model on the Resume,Weibo,and MSRA data sets were 97.20%,74.28%and 95.74%,respectively,which proves the effectiveness of the proposed model for Chinese NER.
关 键 词:命名实体识别 RoBERTa-wwm模型 空洞卷积 注意力机制 特征融合
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7