电子病历中命名实体的智能识别  被引量:47

Intelligent Recognition of Named Entity in Electronic Medical Records

在线阅读下载全文

作  者:叶枫[1] 陈莺莺[1] 周根贵[1] 李昊旻[2] 李莹[2] 

机构地区:[1]浙江工业大学经贸管理学院,杭州310023 [2]浙江大学生物医学工程与仪器科学学院,杭州310027

出  处:《中国生物医学工程学报》2011年第2期256-262,共7页Chinese Journal of Biomedical Engineering

摘  要:电子病历中命名实体的识别对于构建和挖掘大型临床数据库以服务于临床决策具有重要意义,而我国目前对此的研究相对较少。在比较现有的实体识别方法和模型后,采用条件随机场模型(CRF)机器学习的方法,对疾病、临床症状、手术操作3类中文病历中常见的命名实体进行智能识别。首先,通过分析电子病历的数据特征,选择以语言符号、词性、构词特征、词边界、上下文为特征集。然后,基于随机抽取的来自临床医院多个科室的电子病历数据,构建小规模语料库并进行标注。最后,利用条件随机场算法执行工具CRF++进行3次对照实验。通过逐步分析特征集中的多种特征对CRF自动识别的影响,提出在中文病历环境下CRF特征选择和模板设计的一些基本规则。在对照实验中,本方法取得了良好效果,3类实体的最佳F值分别达到了92.67%、93.76%和95.06%。The named entity recognition in electronic medical records is very important for building and mining large-scale clinical data to serve the clinical decision-making.However,in China,there are few relative studies on this.In comparison to the existing entity recognition methods and models,this paper attempted to use a machine learning method based on conditional random field(CRF) model to intelligently recognize three common types of the named entity in Chinese medical records,they are diseases,clinical symptoms and operations.After analyzing the data characteristics of electronic medical records,a rich set of features was chosen,including linguistic symbol,part of speech,word formation pattern,word boundaries,and context feature.Then,a small-scale corpus was constructed and marked based on the electronic medical records,which were randomly selected from various hospital departments.Finally,three control experiments,with the help of a CRF algorithm implementation tool called CRF + +,were carried out.Through analyzing the effect of different features in the feature set on the ability of CRF model to automatically recognize the entities,we proposed some basic rules of the CRF feature selection and template design under Chinese medical records environment.In the control experiments,the best F-measures in each of three types of entities reached 92.67%,93.76% and 95.06%.

关 键 词:电子病历 命名实体识别 机器学习 条件随机场 

分 类 号:R318[医药卫生—生物医学工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象