基于自适应词嵌入RoBERTa-wwm的名中医临床病历命名实体识别研究  被引量:2

Research on Named Entity Recognition of Named TCM Clinical Medical Records Based on RoBERTa-wwm Adaptive Word Embedding

在线阅读下载全文

作  者:万泽宇 龚庆悦[1] 李铁军 王红云 鲍剑洋[1] WAN Ze-yu;GONG Qing-yue;LI Tie-jun;WANG Hong-yun;BAO Jian-yang(College of Artificial Intelligence and Information Technology,Nanjing University of Traditional Chinese Medicine;The Second Jiangsu Provincial Hospital of Chinese medicine,The Second Affiliated Hospital of Nanjing University of Chinese Medicine;Col-lege of Nursing,Nanjing University of Chinese Medicine,Nanjing 210046,China)

机构地区:[1]南京中医药大学人工智能与信息技术学院 [2]南京中医药大学第二附属医院(江苏省第二中医院) [3]南京中医药大学护理学院,江苏南京210046

出  处:《软件导刊》2022年第12期58-62,共5页Software Guide

摘  要:为了解决中文医疗命名实体识别任务中语义缺失、命名实体嵌套等问题,提升名中医临床病历中的实体识别效果,提出基于自适应词嵌入RoBERTA-wwm的名中医临床病历命名实体识别模型。病历中原始文本经过RoBERTa-wwm预训练模型得到的初始向量采用Soft-lexicon方法动态融合词典信息,进行词汇增强,生成文本语义向量经过下游双向长短期记忆(BiLSTM)学习序列依赖关系,最终经过条件随机场(CRF)解码提取出实体。该模型在名中医李铁军治疗心血管疾病的临床病历数据集上取得86.88%的F1值,较RoBERTa-wwm-CRF、Bert-CRF模型分别提高5.93%、5.87%,在速度上也有所提升。在常规RoBERTA-wwm模型中引入自适应词嵌入进行词汇增强,使模型更好地学习文本语义信息,相较于其他基线模型,其在名中医临床病历命名实体识别任务方面具有显著优势。In order to solve the problems of semantic missing and named entity nesting in Chinese medical named entity recognition task and improve the entity recognition effect in the clinical medical records of famous Chinese medicine, a named entity recognition model based on adaptive word embedding RoBERTA-wwm is proposed for the clinical medical records of famous Chinese medicine. The initial vector of the original text in the medical record obtained by the RoBERTa-wwm pre-training model adopts the Soft-lexicon method to dynamically fuse the dictionary information, perform vocabulary enhancement, and generate the text semantic vector. Random field(CRF) decoding extracts entities. The model proposed in this paper achieved an F1 value of 86.88% on the clinical medical record data set of the famous Chinese medicine Li Tiejun in the treatment of cardiovascular disease. Compared with the RoBERTa-wwm-CRF and Bert-CRF models, the F1 value was increased by 5.93% and 5.87%, respectively. There has also been an increase in speed. The adaptive word embedding is introduced into the conventional RoBERTA-wwm model for vocabulary enhancement, so that the model can better learn the textual semantic information. Compared with other baseline models, the named entity recognition task in the clinical medical records of famous Chinese medicine has significant advantages.

关 键 词:信息抽取 命名实体识别 名中医临床病历 RoBERTa-wwm 词汇增强 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象