检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:万泽宇 龚庆悦[1] 李铁军 王红云 鲍剑洋[1] WAN Ze-yu;GONG Qing-yue;LI Tie-jun;WANG Hong-yun;BAO Jian-yang(College of Artificial Intelligence and Information Technology,Nanjing University of Traditional Chinese Medicine;The Second Jiangsu Provincial Hospital of Chinese medicine,The Second Affiliated Hospital of Nanjing University of Chinese Medicine;Col-lege of Nursing,Nanjing University of Chinese Medicine,Nanjing 210046,China)
机构地区:[1]南京中医药大学人工智能与信息技术学院 [2]南京中医药大学第二附属医院(江苏省第二中医院) [3]南京中医药大学护理学院,江苏南京210046
出 处:《软件导刊》2022年第12期58-62,共5页Software Guide
摘 要:为了解决中文医疗命名实体识别任务中语义缺失、命名实体嵌套等问题,提升名中医临床病历中的实体识别效果,提出基于自适应词嵌入RoBERTA-wwm的名中医临床病历命名实体识别模型。病历中原始文本经过RoBERTa-wwm预训练模型得到的初始向量采用Soft-lexicon方法动态融合词典信息,进行词汇增强,生成文本语义向量经过下游双向长短期记忆(BiLSTM)学习序列依赖关系,最终经过条件随机场(CRF)解码提取出实体。该模型在名中医李铁军治疗心血管疾病的临床病历数据集上取得86.88%的F1值,较RoBERTa-wwm-CRF、Bert-CRF模型分别提高5.93%、5.87%,在速度上也有所提升。在常规RoBERTA-wwm模型中引入自适应词嵌入进行词汇增强,使模型更好地学习文本语义信息,相较于其他基线模型,其在名中医临床病历命名实体识别任务方面具有显著优势。In order to solve the problems of semantic missing and named entity nesting in Chinese medical named entity recognition task and improve the entity recognition effect in the clinical medical records of famous Chinese medicine, a named entity recognition model based on adaptive word embedding RoBERTA-wwm is proposed for the clinical medical records of famous Chinese medicine. The initial vector of the original text in the medical record obtained by the RoBERTa-wwm pre-training model adopts the Soft-lexicon method to dynamically fuse the dictionary information, perform vocabulary enhancement, and generate the text semantic vector. Random field(CRF) decoding extracts entities. The model proposed in this paper achieved an F1 value of 86.88% on the clinical medical record data set of the famous Chinese medicine Li Tiejun in the treatment of cardiovascular disease. Compared with the RoBERTa-wwm-CRF and Bert-CRF models, the F1 value was increased by 5.93% and 5.87%, respectively. There has also been an increase in speed. The adaptive word embedding is introduced into the conventional RoBERTA-wwm model for vocabulary enhancement, so that the model can better learn the textual semantic information. Compared with other baseline models, the named entity recognition task in the clinical medical records of famous Chinese medicine has significant advantages.
关 键 词:信息抽取 命名实体识别 名中医临床病历 RoBERTa-wwm 词汇增强
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.22.41.47