中文电子病历命名实体识别的研究与进展  被引量:17

Research and Development of Named Entity Recognition in Chinese Electronic Medical Record

在线阅读下载全文

作  者:杜晋华 尹浩[1] 冯嵩[2] DU Jin-hua;YIN Hao;FENG Song(Beijing National Research Center for Information Science and Technology,Tsinghua University,Beijing 100084,China;Xiangya Hospital of Central South University,Changsha,Hunan 410008,China)

机构地区:[1]清华大学北京信息科学与技术国家研究中心,北京100084 [2]中南大学湘雅医院,湖南长沙410008

出  处:《电子学报》2022年第12期3030-3053,共24页Acta Electronica Sinica

基  金:国家重点研发计划(No.2020YFC2005003);国家自然科学基金(No.92067206,No.61972222,No.62102217)。

摘  要:海量电子病历(Electronic Medical Record,EMR)数据是支撑医疗智能化研究的重要原料,然而电子病历文本数据的半结构化甚至无结构化特点,造成后续对其分析利用的极大困难.虽然近年来基于深度学习的命名实体识别(Named Entity Recognition,NER)成为对电子病历进行自动化信息抽取的核心技术,但鉴于中文电子病历(Chinese Electronic Medical Record,CEMR)具有包括病历文本的非规范性与专业性、医疗实体的独特性和标注语料的稀缺性在内的独特文本数据特征,该研究目前仍存在诸多挑战.本文对中文电子病历命名实体识别的研究与进展进行了综述,系统梳理了命名实体识别的概念、相关理论模型以及制约中文电子病历命名实体识别准确率和识别效率的主要原因;从技术发展角度详细分析了中文电子病历命名实体识别方法的变革历程;并对中文电子病历命名实体识别效果做了实验验证与深入分析,指出了现有模型的不足与改进方向.鉴于国内近年来与中文信息学处理相关的测评会议CCKS持续关注中文电子病历命名实体识别,本文特别对CCKS在该领域五年来的全部代表性测评论文做了纵横对比分析,并通过在主流模型上的深入实验与研究,为后续该领域的继续推进寻求了思路.Massive electronic medical record(EMR) data is an important raw material to support the research of medical intelligence, but the semi-structured or even unstructured characteristics of EMR text data make it extremely difficult to analyze and utilize them subsequently. Although named entity recognition(NER) based on deep learning has become a core technology for automated information extraction from electronic medical records in recent years, there are still many challenges in this research given the unique textual data characteristics of Chinese electronic medical record(CEMR), including the non-normative and specialized nature of medical record text, the uniqueness of medical entities and the scarcity of annotated corpus.This paper provides an overview of the research and progress of named entity recognition in Chinese electronic medical records, systematically sorting out the concept of named entity recognition, related theoretical models and the main reasons limiting the accuracy and efficiency of named entity recognition in Chinese electronic medical records;analyzes in detail the change history of named entity recognition methods in Chinese electronic medical records from the perspective of technical development;and makes an experimental verification and in-depth analysis of the effect of named entity recognition in Chinese electronic medical records, and points out the shortcomings and improvement directions of existing models.In view of the fact that CCKS, a domestic evaluation conference related to Chinese informatics processing, has continued to focus on the recognition of named entities in Chinese electronic medical records in recent years, this paper presents a longitudinal and cross-sectional analysis of all the representative evaluation papers of CCKS in this field over the past five years, and seeks ideas for the continued advancement of this field through in-depth experiments and research on the mainstream model.

关 键 词:中文电子病历 命名实体识别 深度学习 预训练模型 自然语言处理 医疗信息化 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象