基于词汇增强和对抗训练的中文命名实体识别  

Chinese named entity recognition based on lexical enhancement and adversarial training

在线阅读下载全文

作  者:杨竣辉[1] 刘保冰 YANG Jun-hui;LIU Bao-bing(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China)

机构地区:[1]江西理工大学信息工程学院,江西赣州341000

出  处:《计算机工程与设计》2024年第12期3712-3718,共7页Computer Engineering and Design

基  金:国家自然科学基金项目(61273328)。

摘  要:针对现有的中文命名实体识别的方法获取中文词级别的特征信息效果不理想且模型易受噪音影响而存在不稳定的问题,提出一种基于词汇增强和对抗训练的中文命名实体识别方法。将输入文本通过词汇增强模块获取到词汇向量,将预训练模型得到的字符级嵌入向量和词汇向量进行字词融合;使用字词融合的嵌入向量通过MOA方式生成对抗样本;使用BiGRU和CRF分别获取语义编码信息并进行解码得到预测结果。实验结果表明,该方法在中文命名实体识别数据集Resume和中药说明书上的F1值分别达到97.14%和73.65%,验证了该模型的有效性。To address the problems that the existing methods for Chinese named entity recognition are not effective in obtaining Chinese word-level feature information and the model is susceptible to noise and unstable,a Chinese named entity recognition method based on vocabulary enhancement and adversarial training was proposed.The input text was obtained from vocabulary vectors through the vocabulary enhancement module,and the character-level embedding vectors obtained from the pre-training model and the word-level embedding were fused to obtain the embedding vectors.The embedding vectors were used to generate the adversarial samples through the MOA method.The semantically coded information was obtained from the BiGRU and the predicted results were obtained from decoding using the CRF,respectively.Experimental results show that the F1 value of the proposed method on the Chinese named entity recognition dataset Resume and the Chinese medicine instruction manual reaches 97.14%and 73.65%respectively,verifying the effectiveness of the model.

关 键 词:中文命名实体识别 词汇增强 预训练模型 字词融合 对抗训练 双向门控循环单元 条件随机场 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象