检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:夏冬[1,2] 李国垒 陈先来[2,4] XIA Dong;LI Guo- lei;CHEN Xian- lai(Chengdu Literature and Information Center,Chinese Academy of Sciences,Chengdu 610000,Sichuan Pro vince,China;Institute of Medical Information,Chinese Academy of Medical Sciences,Beijing 100020,China;Institute of Information Security and Big Data,Central South University,Changsha 410083,Hunan Province,China;Key Laboratory of Medical Information Research,Hunan General Colleges and Universities,Changsha 410013,Hunan Province,China)
机构地区:[1]中国科学院成都文献情报中心,四川成都610000 [2]中南大学信息安全与大数据研究院,湖南长沙410083 [3]中国医学科学院医学信息研究所,北京100020 [4]医学信息研究湖南省普通高等学校重点实验室,湖南长沙410013
出 处:《中华医学图书情报杂志》2018年第2期63-68,共6页Chinese Journal of Medical Library and Information Science
基 金:国家社科基金项"面向临床决策的电子病历潜在语义分析及应用研究"(13BTQ052)的研究成果之一
摘 要:目的:通过挖掘电子病历文本中的信息,探索有效的文本挖掘方法,以期实现电子病历的决策支持价值。方法:将2500份胃癌患者电子病历随机分为训练组和测试组,利用词典结合统计的方法对训练组病历文本进行分词,根据每个切分词与从病历中抽取的治疗方案的共现频次对切分词进行聚类,统计训练组病历中的文本在各个聚类中词的匹配数,并以训练组病历文本在各类中的匹配词数和治疗方案建立起Bayes判别函数作为决策支持模型,对测试组病历进行验证,对分词方法及判别模型进行评价。结果:随机抽取50份发现分词召回率为74.24%,准确率为82.30%,F-1值为78.06%。在切分词聚为五类时,所建立的判别模型对测试组病历的判定准确率为62%。结论:词典结合统计的分词方法在电子病历文本分词中的效果较好,基于聚类的电子病历文本挖掘可实现病历的决策支持价值,但建立的决策支持模型准确度不高,仍需对建模过程中病历文本分词及切分词的处理进行进一步研究。Objective To study the effective text mining methods by mining the information in electronic medical records(EMR) in order to achieve their value in support of decision-making. Methods Two thousand and five hundred EMR of gastric cancer patients were randomly divided into training group(n = 1500) and testing group(n= 1000). The words in the text of EMR of training group were identified using dictionary in combination with statistical methods. The segmented words were clustered according to the co-occurrence frequency of each segmented word and the treatment plan extracted from EMR. The matched number of words in each cluster from the text of EMR of training group was recorded. A decision-making support model of Bayes discrimination function was established according to the matched number of words in each cluster from the text of EMR of training group and treatment plan to verify the EMR in training group and to evaluate the words segmenting methods and the discrimination model.Results Fifty randomly selected RME showed that the recall rate,accurate rate and F-1 value of segmented words were 74. 24%,82. 30% and 78. 06% respectively. The accurate rate of the established discrimination model was 62% for the identification of EMR of testing group when the segmented words were clustered into 5 categories.Conclusion The efficiency of dictionary in combination with statistical methods is good for identifying words from the text of EMR. Cluster-based text mining of EMR can achieve the decision-making support value of EMR,but the accuracy of the established decision-making support model is not as high as expected. Further study is thus necessary to identify the words from the text of EMR and the process of segmented words in establishing the decision-making support model.
关 键 词:分词 聚类分析 Bayes判别 电子病历 临床决策支持 胃癌
分 类 号:R197.323[医药卫生—卫生事业管理] R735.7[医药卫生—公共卫生与预防医学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229