引入词汇信息的中文医学命名识别模型研究

Research on Chinese medical naming recognition model with vocabulary information

作　　者：陈晶孙亚轩邢珂萱 CHEN Jing;SUN Yaxua;XING Kexuan(School of Electronics and Information Engineering,Guangdong Ocean University,Zhanjiang 524088;School of Information Science and Engineering,Yanshan University,Qinhuangdao 066004;Key Laboratory of Virtual Technology and System Integration,Yanshan University,Qinhuangdao 066004)

机构地区：[1]广东海洋大学数学与计算机学院,湛江524088 [2]燕山大学信息科学与工程学院,秦皇岛066004 [3]河北省虚拟技术与系统集成重点实验室,秦皇岛066004

出　　处：《高技术通讯》2024年第10期1058-1069,共12页Chinese High Technology Letters

基　　金：国家自然科学基金(62172352,61871465,42306218);中央政府引导地方科技发展基金(226Z0102G,226Z0305G);河北省自然科学基金(2022203028);广东海洋大学科研启动基金(060302102304)资助项目。

摘　　要：医学领域文本存在大量的专业词汇,相比于通用领域更容易出现分词错误和未登录词的问题,其结果会导致上下文语义缺失,并影响命名实体识别(NER)的准确率。为了解决上述问题,本文提出了引入词汇信息的基于门控循环单元的中文医学命名实体识别模型WI-NER。首先,基于中文医学数据集的特点,描述了中文医学领域的命名实体识别的任务定义、实体位置和实体类别标签,并将模型在嵌入层对匹配专业词的字符进行特征嵌入与向量融合;其次,在上下文编码层添加词汇门控单元,利用循环神经网络的记忆与遗忘机制,自动提取实体识别所需的特征,并通过引入词汇信息和先验知识,实现了中文医学命名实体识别效果的提升;最后,对本模型在3个数据集上进行了实验验证,结果表明,本文提出的中文医学命名实体识别模型在准确率方面优于基线模型,达到了预期的医学领域特性。There are a large number of specialized words in medical texts,which are more prone to word segmentation errors and unregistered words than in general fields,resulting in the loss of contextual semantics and affecting the accuracy of named entity recognition(NER).In order to solve the above problems,WI-NER,a Chinese medical named entity recognition model based on gated circulation unit with lexical information,is proposed in this paper.Firstly,on the basis of the characteristics of Chinese medical data set,the task definition,entity location and entity category label of named entity recognition in Chinese medical field are described.In addition,the model performs feature embedding and vector fusion on the characters matching professional words in the embedding layer.Secondly,a lexical gating unit is added to the context coding layer,and the features required for entity recognition are automatically extracted by using the memory and forgetting mechanism of recurrent neural networks.By introducing lexical information and prior knowledge,the recognition effect of Chinese medical named entities is improved.Finally,the model is verified by experiments on three datasets,and the results show that the accuracy of the Chinese medical named entity recognition model proposed in this paper is better than that of the baseline model,achieving the expected characteristics in the medical field.

关键词：中文医学命名识别先验知识嵌入层门控单元词汇信息

分类号：TP391.1[自动化与计算机技术—计算机应用技术] R-05[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

引入词汇信息的中文医学命名识别模型研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

引入词汇信息的中文医学命名识别模型研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索