检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]广州中医药大学,广州510006
出 处:《中华中医药杂志》2017年第5期2159-2162,共4页China Journal of Traditional Chinese Medicine and Pharmacy
基 金:教育部博士点基金项目(No.20114425110009)~~
摘 要:目的:通过比较两个基于最大概率法的症状提取方案,探讨中医症状信息的提取和标准化。方法:数据分析和处理在R 3.3.2上进行。运用《诊断学》《中医诊断学》及1 000份已标记的肺炎住院病历建立症状标准化数据库,症状描述词库和关键词-形容词词库。基于最大概率法分别设计出中文分词方案,直接提取方案和组合提取方案。并用这3种方案对2 311份肺炎病历进行症状信息提取和标准化,从产生维度、手工处理情况、症状提取效果对方案进行比较。结果:直接提取方案和组合提取方案均能有效降低维度,组合提取方案手工处理百分比较小和症状提取效果较好。结论:基于最大概率法的组合提取方案能有效提取中医症状信息。Objective: To discuss the extraction and standardization of traditional Chinese medicine symptom by comparing two symptom extraction programs based on the maximum probability method. Methods: All data were analyzed and processed on R 3.3.2. Diagnostics, Diagnostics of Traditional Chinese Medicine and 1 000 marked pneumonia hospitalized medical records were used to establish symptomstandardization database, symptom description lexicon and keyword-adjective lexicon. Based on the maximum probability method, Chinese word segmentation program(CSP), direct extraction program(DEP) and combination extraction program(CEP) weredesigned respectively. And these three programs were used to extract and standardize the symptoms of 2 311 pneumonia medical records,and the results were compared with each other bygenerating dimension, manual processing and the efficiency of symptom extraction. Results: Compared with CSP, CEP and DEP were effective in reducing the dimension. And CEP was lower on the manual processing rate and more efficient on the symptom extraction. Conclusion: CEP based on the maximum probability methodcan effectively extract TCM symptom information.
关 键 词:症状 文本挖掘 文本数据结构化 中文分词 最大概率法 标准化
分 类 号:R241[医药卫生—中医诊断学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.128.78.139