基于多特征融合的医疗社区问题文本聚类研究  被引量:1

Research on text clustering of medical community questions based on multi-feature fusion

在线阅读下载全文

作  者:申喜凤 李美婷 张维宁 南嘉乐 孙媛媛 付玉伟 高东平[1] Shen Xifeng;Li Meiting;Zhang Weining;Nan Jiale;Sun Yuanyuan;Fu Yuwei;Gao Dongping(Institute of Medical Information,Peking Union Medical College Hospital,Chinese Academy of Medical Sciences&Peking Union Medical College,Beijing 100020,China;Peking Union Medical College Hospital,Chinese Academy of Medical Sciences&Peking Union Medical College)

机构地区:[1]中国医学科学院/北京协和医学院医学信息研究所,北京100020 [2]中国医学科学院/北京协和医学院

出  处:《中国数字医学》2022年第12期28-34,共7页China Digital Medicine

基  金:科技创新2030-“新一代人工智能”重大项目(2020AAA0104905)。

摘  要:目的:医学问题文本数据存在上下文语义缺失且特征稀疏高维等特点,为提高其聚类效果,提出将文本语义特征和主题特征相融合的文本表示方法用于文本聚类。方法:以医疗社区中的问题文本为数据源,将加权fastText词汇语义特征和LDA文档主题特征融合对问题文本进行表示,构建融合特征用于问题文本聚类,聚类效果评估采用聚类准确度(ACC)和标准互信息(NMI)。结果:与其他方法相比,特征融合的聚类模型表现最佳,其聚类准确度和标准互信息为0.577和0.429,高于其他相关基线模型。结论:实验表明,将特征进行融合能够更加全面准确有效地表征医学问题文本,为医学问题文本特征表示和聚类知识发现提供参考。Objective The text data of medical questions have the characteristics of context semantics deficiency and sparse and high-dimensional features.In order to improve the effect of text clustering,this paper proposes a text representation method fusing text semantic features and topic features for text clustering.Methods Taking the questions text in the medical community as the data source,the weighted fastText lexical semantic features and LDA’s document-topic features were fused to represent the questions text,and the integrated features were constructed for the questions text clustering.The clustering effect was evaluated by Clustering Accuracy(ACC) and Standard Mutual Information(NMI).Results Compared with other methods,the clustering model based on feature fusion performed the best,and its clustering accuracy and standard mutual information were 0.577 and 0.429,which were higher than other relevant baseline models.Conclusion The Experiment shows that the feature fusion method can represent the medical questions text more comprehensively accurately and effectively,and can provide reference for feature representation and clustering knowledge discovery of medical questions text.

关 键 词:LDA fastText模型 特征融合 聚类 问题文本 

分 类 号:R319[医药卫生—基础医学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象