检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:蒋丽媛 吴亚东 王书航 张巍瀚 李懿 JIANG Li-yuan;WU Ya-dong;WANG Shu-hang;ZHANG Wei-han;LI Yi(College of Computer Science and Engineering,Sichuan University of Science and Engineering,Yibin 644000,China)
机构地区:[1]四川轻化工大学计算机科学与工程学院,宜宾644000
出 处:《科学技术与工程》2023年第17期7436-7443,共8页Science Technology and Engineering
基 金:四川轻化工大学人才引进项目(2020RC20)。
摘 要:汉字是象形文字,其字形特征对于中文命名实体识别有着重要的作用。针对双向长短期记忆模型(bi-directional long short-term memory,BiLSTM)提取部首,命名实体识别准确率不高的问题,提出笔画组成编码器,用于获取汉字的字形特征,并将笔画字形特征向量和预训练的语言表征模型(bidirectional encoder representation from transformers,BERT)输出的字向量进行拼接,将拼接后的向量放入双向长短期记忆模型与条件随机场(conditional random field,CRF)相连的标注模型(BiLSTM-CRF)中进行命名实体识别。实验表明,所提的方法在Resume数据集上命名实体识别准确率有显著提升。相较于用卷积神经网络做编码器提取汉字字形特征,准确率高出0.4%。相较于使用BiLSTM提取的部首特征模型和加入词典的长短期记忆模型(lattice LSTM),其准确率分别提升了4.2%、0.8%。Chinese characters are pictographs,and their character features play an important role in the recognition of Chinese named entities.To address the problem that the bi-directional long short-term memory model(BiLSTM)extracts radicals and the recognition accuracy of named entities is not high,a stroke composition encoder was proposed to obtain the character features of Chinese characters,and the vector of stroke character features and the pre-trained language representation model(bidirectional encoder representation from transformers,BERT)were stitched together.The stroke-character feature vectors were stitched together with the word vectors from the pre-trained BERT,and the stitched vectors were put into a bi-directional long and short-term memory model linked to a conditional random field(CRF)annotation model(BiLSTM-CRF)for named entity recognition.Experiments show that the proposed method has significantly improved the accuracy of named entity recognition on the Resume dataset.Compared with using convolutional neural networks as encoders to extract Chinese character features,the accuracy is 0.4%higher.The accuracy is 4.2%and 0.8%higher than that of the BiLSTM extracted radical feature model and the Lattice LSTM model with the addition of a lexicon,respectively.
关 键 词:字形特征 中文命名实体识别 BiLSTM-CRF 笔画组成编码器 动态词向量
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7