基于RBAC模型的中文医疗命名实体识别  被引量:1

Chinese Medical Named Entity Recognition Based on RBAC Model

在线阅读下载全文

作  者:张斌[1] 赵婷婷[1] 张碧霞 陈亚瑞[1] 王嫄 ZHANG Bin;ZHAO Tingting;ZHANG Bixia;CHEN Yarui;WANG Yuan(College of Artificial Intelligence,Tianjin University of Science&Technology,Tianjin 300457,China)

机构地区:[1]天津科技大学人工智能学院,天津300457

出  处:《天津科技大学学报》2024年第5期56-62,共7页Journal of Tianjin University of Science & Technology

基  金:国家自然科学基金项目(61976156);天津市企业科技特派员项目(20YDTPJC00560)。

摘  要:中文医疗命名实体识别旨在从非结构化数据中抽取结构化实体,目前的主流研究都使用了大量的训练数据。针对中文医疗命名实体识别训练数据匮乏的问题,提出了基于联合分词的RBAC(RoBERTa-BiGRU-Attention-CRF)模型和基于语义搜索的命名实体识别数据增强方法。首先利用预训练模型和双向门控循环单元(BiGRU)提取文本的深度双向语义表示,再将该语义表示分别送入分词模块和命名实体识别模块。分词模块利用条件随机场(CRF)得到分词信息。命名实体识别模块利用BiGRU与多头注意力得到混合语义表示,再送入CRF得到命名实体识别的标签序列。在CCKS2019中文电子病历数据集上的实验结果表明,该方法在数据量较少的情况下F_(1)达到90.5%,证明了该方法的有效性。Chinese medical named entity recognition aims to extract structured entities from unstructured data.Current mainstream research uses a large amount of training data.Aiming at the problem of lack of training data for Chinese medical named entity recognition,a RoBERTa-BiGRU-Attention-CRF(RBAC)model based on joint segmentation and a novel data enhancement method for named entity recognition based on semantic search are proposed in this article.Specifically,the pretrained model and the Bidirectional Gated Recurrent Unit(BiGRU)are first used to extract the deep bidirectional semantic representation of the text,and then the semantic representation is sent to the word segmentation module and the named entity recognition module respectively.The word segmentation module uses conditional random fields(CRF)to obtain word segmentation information.The named entity recognition module uses BiGRU and multi-head attention to obtain a mixed semantic representation,and then is sent to CRF to obtain the tag sequence for named entity recognition.Experimental results on the CCKS2019 Chinese electronic medical record datasets showed that the F_(1) of this method reached 90.5%when the amount of data was small,thus proving the effectiveness of this method.

关 键 词:多任务学习 预训练模型 双向门控循环单元 多头注意力 条件随机场 数据增强 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象