检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:肖晓霞 刘明婷 杨冯天赐 刘鉴建县 杨阳 石月 XIAO Xiaoxia;LIU Mingting;YANG Fengtianci;LIU Jianjianxian;YANG Yang;SHI Yue(School of Informatics,Hunan University of Chinese Medicine,Changsha 410208,China;College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China;The College of Chemistry of Xiangtan University,Xiangtan 411105,China;Hunan Zeta Technology Co.,Ltd.,Changsha 410012,China;College of Engineering and Technology,Northeast Forestry University,Harbin 150040,China;Beijing Ruidi Hongxin Science and Trade Co.,Ltd.,Beijing 100071,China)
机构地区:[1]湖南中医药大学信息科学与工程学院,湖南长沙410208 [2]湖南大学信息科学与工程学院,湖南长沙410082 [3]湘潭大学化学学院,湖南湘潭411105 [4]湖南泽塔科技有限公司,湖南长沙410012 [5]东北林业大学工程技术学院,黑龙江哈尔滨150040 [6]北京瑞迪弘欣科贸有限公司,北京100071
出 处:《大数据》2022年第3期128-139,共12页Big Data Research
基 金:国家重点研发计划基金资助项目(No.2017YFC1703300);湖南中医药大学信息科学与工程学院学科开放基金项目(No.2018DK02)。
摘 要:中医医案是中医医生学习临床经验的重要文献资料,对中医医案进行结构化处理有利于采用机器学习等方法总结临床经验,加速中医传承。为了实现中医医案快速结构化,提出了一种基于自然语言处理的中医医案文本快速结构化方法。将《中国现代名中医医案精粹》作为结构化对象,采用光学字符识别技术识别医案截图的文本,同时对文本做初步结构化。构建简单症状词典,采用结合词典的改进的N-gram模型获取医案文本中的症状、体征等词,并在结构化过程中更新词典,实现了对4754份文本医案的结构化。随机选取666份医案文本对最终模型进行测试,其F1值达到82.99%。Traditional Chinese medicine(TCM)medical records are the most valuable documents for TCM doctors to learn clinical experience.The structured TCM medical records are conducive to extract the clinic knowledge based on machine learning and other methods,which can accelerate the inheritance of TCM.A fast text structuring methodology of TCM medical records based on natural language processing(NLP)was proposed to structure the clinic cases.Essence of Chinese Modern Famous Chinese Medical Records was selected as the medical record structuring objects,and the text in the screenshots of the medical records was recognized by optical character recognition(OCR)and the text was initially structured.A simple symptom dictionary was constructed,and the improved N-gram model combined with the dictionary was used to recognize the symptoms,signs and other words in the text,and the dictionary was updated in the structuring process.At last,4754 text medical records were structured.The final model was test on 666 medical records selected randomly from the corpus,and its F1 value reached 82.99%.
关 键 词:N-GRAM模型 自然语言处理 中医医案 中文分词 光学字符识别
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28