基于最大概率法探讨中医症状信息提取与标准化  被引量:5

Discussion on the extraction and standardization of TCM symptom based on maximum probability method

在线阅读下载全文

作  者:梁礼铿[1] 黎敬波[1] 

机构地区:[1]广州中医药大学,广州510006

出  处:《中华中医药杂志》2017年第5期2159-2162,共4页China Journal of Traditional Chinese Medicine and Pharmacy

基  金:教育部博士点基金项目(No.20114425110009)~~

摘  要:目的:通过比较两个基于最大概率法的症状提取方案,探讨中医症状信息的提取和标准化。方法:数据分析和处理在R 3.3.2上进行。运用《诊断学》《中医诊断学》及1 000份已标记的肺炎住院病历建立症状标准化数据库,症状描述词库和关键词-形容词词库。基于最大概率法分别设计出中文分词方案,直接提取方案和组合提取方案。并用这3种方案对2 311份肺炎病历进行症状信息提取和标准化,从产生维度、手工处理情况、症状提取效果对方案进行比较。结果:直接提取方案和组合提取方案均能有效降低维度,组合提取方案手工处理百分比较小和症状提取效果较好。结论:基于最大概率法的组合提取方案能有效提取中医症状信息。Objective: To discuss the extraction and standardization of traditional Chinese medicine symptom by comparing two symptom extraction programs based on the maximum probability method. Methods: All data were analyzed and processed on R 3.3.2. Diagnostics, Diagnostics of Traditional Chinese Medicine and 1 000 marked pneumonia hospitalized medical records were used to establish symptomstandardization database, symptom description lexicon and keyword-adjective lexicon. Based on the maximum probability method, Chinese word segmentation program(CSP), direct extraction program(DEP) and combination extraction program(CEP) weredesigned respectively. And these three programs were used to extract and standardize the symptoms of 2 311 pneumonia medical records,and the results were compared with each other bygenerating dimension, manual processing and the efficiency of symptom extraction. Results: Compared with CSP, CEP and DEP were effective in reducing the dimension. And CEP was lower on the manual processing rate and more efficient on the symptom extraction. Conclusion: CEP based on the maximum probability methodcan effectively extract TCM symptom information.

关 键 词:症状 文本挖掘 文本数据结构化 中文分词 最大概率法 标准化 

分 类 号:R241[医药卫生—中医诊断学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象