检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:许源 葛艳秋 王强 熊刚 易应萍 XU Yuan;GE Yan-Qiu;WANG Qiang;XIONG Gang;YI Ying-Ping(Clinical Big Data Research Center, The Second Affiliated Hospital, Nanehang University, Nanchang 330006, China;Depai'tment of Public Health, School of Medicine, Nanchang University, Nanchang 330006, China;HBT Medical Information Company, Suzhou 215000, China;Department of Science and Education, The Second Affiliated Hospital of Nanchang University, Nanchang 330006, China)
机构地区:[1]南昌大学附属第二医院临床医疗大数据研究中心,江西南昌330006 [2]南昌大学医学部公共卫生学院,江西南昌330006 [3]赫博特医疗信息科技有限公司,江苏苏州215000 [4]南昌大学第二附属医院科教处,江西南昌330006
出 处:《中山大学学报(医学版)》2018年第3期455-462,共8页Journal of Sun Yat-Sen University:Medical Sciences
基 金:江西省科技厅科技创新平台(20171BCD40024);江西省科技厅一般项目(20171BBH80025)
摘 要:【目的】研究针对非结构化临床电子病历的自然语言处理模型的构建和优化,并利用该模型对江西省医疗大数据平台中卒中病人的病历进行结构化数据提取。【方法】从江西省医疗大数据平台中随机筛选500份2011-2016年的卒中病人入院记录,根据临床科研的实际需求构建了脑卒中专科病人的命名实体标注体系和命名实体标注语料库,利用该语料库构建基于CRF以及RUTA规则的命名实体抽取模型,并通过调整RUTA规则以及参数提升识别准确率。【结果】经五折交叉验证,该模型的医学命名实体的抽取准确率0.960,召回率0.916,Fscore 0.939,利用该抽取模型对大数据平台中10 295份脑卒中患者入院记录进行抽取,共抽取命名实体264 580条,命名实体修饰1 161 077条。【结论】构建的自然语言抽取模型识别准确率较高,通过该模型能够准确地从大量非结构化病历中获取病人的既往史、生活史、临床表现等有价值的科研数据,有效提升心脑血管疾病的临床科研效率和科研水平。【Objective】 To research the construction and optimization of natural language processing model for unstructured medical records,and using the model to extract structured data from medical records of stroke patients in Jiangxi Medical Big Data Platform.【Methods】According to the actual needs of clinical research,a stroke specialist entity annotation system and named entity annotation corpus were constructed based on 500 hospital admission records of stroke patients,which randomly selected between 2011 to 2016 from the Jiangxi provincial medical big data platform. The corpus is used to construct a named entity extraction model based on CRF and RUTA rules,and the recognition accuracy is improved by adjusting RUTA rules and parameters.【Results】Accuracy rate of extraction model was 0.960,recall rate was 0.916 and F-score was 0.939. The extraction model was used to extract 264 580 entities and 1 161 077 entity relation from 10 295 stroke patients′ admission records of the medical big data platform.【Conclusions】The constructed natural language extraction model has a high recognition accuracy,which can accurately obtain valuable scientific research data of patients′ past history,life history and clinical manifestations from a large number of unstructured medical records and effectively improve the clinical research efficiency and scientific research level of cerebrovascular diseases.
关 键 词:中文电子病历 命名实体识别 条件随机场CRF 脑卒中
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.227.0.98