基于医学主题词标引规则的词共现聚类分析结果自动判读和表达的研究  被引量:1

Automatic Expression of Co-occurrence Clustering Based on Indexing Rules of Medical Subject Headings

在线阅读下载全文

作  者:邬金鸣 侯跃芳 崔雷 Wu Jinming;Hou Yuefang;Cui Lei(Institute of Medical Information/Medical Library,Chinese Academy of Medical Science&Peking Union Medical College,Beijing 100020,China;College of Medical Informatics,China Medical University,Shenyang 110122,China)

机构地区:[1]中国医学科学院/北京协和医学院医学信息研究所/图书馆,北京100020 [2]中国医科大学医学信息学院,沈阳110122

出  处:《数据分析与知识发现》2020年第9期133-144,共12页Data Analysis and Knowledge Discovery

摘  要:【目的】探索一种易于用户理解的规范化、自动化聚类结果判读和表达方式,促进主题词共现聚类的发展。【方法】以肿瘤诊断主题为例,参考标引教材梳理相关的主题词/副主题词标引规则,选取10组肿瘤为训练集进行高频主题词共现聚类分析,人工审读聚类结果,结合标引规则,梳理高频主题词语义类型/副主题词组合规则。基于规则编写Python程序,自动解读验证集中4组肿瘤的聚类结果,并请专家对其揭示类团内容的准确性、全面性、实用性、易理解性和简洁性进行评价。【结果】整理标引规则30条,梳理面向主题词共现聚类结果解读的语义类型/副主题词组合规则98条。验证集的5个评价指标(准确性、全面性、实用性、易理解性和简洁性)分值分别为4.282、4.435、4.209、4.457、4.206(满分5分)。【局限】探索语义类型/副主题词组合规则时,研究结果与每次聚类过程中高频阈值的选择、聚类结果数的确定均有关联。利用组合规则解读类团内容难以揭示类团"隐藏信息"。【结论】基于规则自动解读主题词共现聚类分析结果具有较强适用性,在一定程度上促进了主题词共现聚类分析结果表达的客观化与规范化。[Objective]This study proposes an automatic procedure to present the clustering results,aiming to promote the development of co-word clustering analysis.[Methods]First,we examined the indexing rules of neoplastic diagnosis and chose 10 common neoplasms as sample sets for co-occurrence clustering analysis.Then,we reviewed the results and combined the indexing rules to identify the semantic types/subheading combination patterns of high-frequency subject headings.Third,we developed a python application to automatically interpret the clustering results for four groups of neoplasms.Finally,we invited 12 experts to evaluate the accuracy,comprehensiveness,practicality,comprehensibility and simplicity of the presentation.[Results]We found 30 indexing patterns of neoplastic diagnosis as well as 98 combination semantic patterns.The scores of the accuracy,comprehensiveness,practicality,comprehensibility and simplicity were 4.282,4.435,4.209,4.457,and 4.206 out of 5.[Limitations]It was difficult to reveal the"hidden relations"among the subject headings with the proposed method.[Conclusions]Our new method could effectively present results of co-occurrence clustering analysis for medical records.

关 键 词:共词分析 聚类分析 类团描述 知识表达 自动解读 

分 类 号:G202[文化科学—传播学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象