检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:原旎 卢克治 袁玉虎[1] 舒梓心 杨扩 张润顺[3] 李晓东[2] 周雪忠[1] Yuan Ni;Lu Kezhi;Yuan Yuhu;Shu Zixin;Yang Kuo;Zhang Runshun;Li Xiaodong;Zhou Xuezhong(College of Computer Science and Information Technology Beifing Jiaotong University, Belting 100044, China;Hubei Hospital of Traditional Chinese Medicine, Wuhan 430061, China;Guang'anmen Hospital, Chinese Academy of Chinese Medical Sciences, Beijing 100053, China)
机构地区:[1]北京交通大学计算机与信息技术学院,北京100044 [2]湖北省中医院,武汉430061 [3]中国中医科学院广安门医院,北京100053
出 处:《世界科学技术-中医药现代化》2018年第3期355-362,共8页Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology
基 金:国家中医药管理局2015年度国家中医临床研究基地业务建设第二批科研专项(JDZX2015171):肝病回顾性病例表型信息抽取方法与分析研究;负责人:周雪忠;国家科技部国家重点研发计划项目(2017YFC1703506):中医药大数据挖掘研究与创新应用;负责人:于剑
摘 要:目的:命名实体识别在自然语言处理中是最基本的任务之一,本文通过应用深度表示的方法实现临床上的现病史数据的自动标识。方法:本文随机选取了10 426条现病史句子作为主要的文本研究对象,分别用词嵌入(word2vec)和网络结构特征(node2vec)两种构建向量的方法生成不同的词向量特征,再在基于条件随机场(Conditional Random Field,CRF)和结构化支持向量机(Structured Support Vector Machines,SSVM)的方法上进行十重交叉验证,计算并比较基于深度表示的症状表型命名实体抽取的性能。结果:传统的CRF算法的三个评价指标(准确率,召回率,F值)为(0.888 9,0.786 9,0.834 8);基于WENER方法下的CRF和SSVM的评价指标为(0.975 0,0.984 9,0.979 8)和(0.992 8,0.988 9,0.990 8);在GENER方法下基于词的CRF和SSVM算法的三个评价指标为(0.972 8,0.976 8,0.975 2)和(0.983 3,0.974 5,0.978 8);GENER方法下基于字的CRF和SSVM算法的评价指标为(0.927 8,0.862 8,0.887 9)和(0.943 7,0.946 8,0.941 3)。结论:深度表示的命名实体抽取算法性能要比传统的非深度表示的命名实体标识算法性能好。另外,通过比较深度表示的两种算法的性能后发现,无论是基于word2vec生成的词向量还是基于node2vec生成的词向量,SSVM模型算法性能均优于CRF算法的性能。Named entity recognition is one of most basic tasks in natural language processing. In this paper, deeprepresentation-based method is applied to automatic identification of clinical data. First, 10,426 sentences about presenthistory were selected randomly as text training data. Then word2vec-based and node2vec-based deep representationmethods were used to construct low-dimensional word embedding. Based on word vectors of symptoms, we conductedconditional random field(CRF) and structured support vector machine(SSVM) to extract symptom named entity. Finally,the performance of different named entity extraction algorithms for TCM's symptom phenotype were compared with 10-fold cross validation. Three evaluation metrics: precision(P), recall(R) and F1-score(F1) were considered. The results showed, compared with classic CRF algorithm(PR: 0.888 9; RE: 0.786 9; F1:0.834 8), WENRE-based CRF(P: 0.975 0;R: 0.984 9; F1: 0.979 8), WENRE-based SSVM(P: 0.992 8; R: 0.988 9; F1: 0.990 8), word-based CRF under GENER(P:0.972 8; R:0.976 8; F1:0.975 2), word-based SSVM under GENER(P: 0.983 3; R: 0.974 5; F1: 0.978 8), character-based CRF under GENER(P: 0.927 8; R: 0.862 8; F1: 0.887 9), character-based SSVM under GENER(P: 0.943 7; R:0.946 8; F1: 0.941 3). In conclusion, compared with classic CRF algorithm, deep representation-based named entityextraction method of symptom phenotype has a better performance. For both word2vec-based and node2vec-based vectorrepresentation, SSVM algorithm has a better performance than CRF algorithm.
关 键 词:条件随机场 结构化支持向量机 命名实体抽取 中医病历
分 类 号:R33[医药卫生—人体生理学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249