检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:申喜凤 李美婷 张维宁 南嘉乐 孙媛媛 付玉伟 高东平[1] Shen Xifeng;Li Meiting;Zhang Weining;Nan Jiale;Sun Yuanyuan;Fu Yuwei;Gao Dongping(Institute of Medical Information,Peking Union Medical College Hospital,Chinese Academy of Medical Sciences&Peking Union Medical College,Beijing 100020,China;Peking Union Medical College Hospital,Chinese Academy of Medical Sciences&Peking Union Medical College)
机构地区:[1]中国医学科学院/北京协和医学院医学信息研究所,北京100020 [2]中国医学科学院/北京协和医学院
出 处:《中国数字医学》2022年第12期28-34,共7页China Digital Medicine
基 金:科技创新2030-“新一代人工智能”重大项目(2020AAA0104905)。
摘 要:目的:医学问题文本数据存在上下文语义缺失且特征稀疏高维等特点,为提高其聚类效果,提出将文本语义特征和主题特征相融合的文本表示方法用于文本聚类。方法:以医疗社区中的问题文本为数据源,将加权fastText词汇语义特征和LDA文档主题特征融合对问题文本进行表示,构建融合特征用于问题文本聚类,聚类效果评估采用聚类准确度(ACC)和标准互信息(NMI)。结果:与其他方法相比,特征融合的聚类模型表现最佳,其聚类准确度和标准互信息为0.577和0.429,高于其他相关基线模型。结论:实验表明,将特征进行融合能够更加全面准确有效地表征医学问题文本,为医学问题文本特征表示和聚类知识发现提供参考。Objective The text data of medical questions have the characteristics of context semantics deficiency and sparse and high-dimensional features.In order to improve the effect of text clustering,this paper proposes a text representation method fusing text semantic features and topic features for text clustering.Methods Taking the questions text in the medical community as the data source,the weighted fastText lexical semantic features and LDA’s document-topic features were fused to represent the questions text,and the integrated features were constructed for the questions text clustering.The clustering effect was evaluated by Clustering Accuracy(ACC) and Standard Mutual Information(NMI).Results Compared with other methods,the clustering model based on feature fusion performed the best,and its clustering accuracy and standard mutual information were 0.577 and 0.429,which were higher than other relevant baseline models.Conclusion The Experiment shows that the feature fusion method can represent the medical questions text more comprehensively accurately and effectively,and can provide reference for feature representation and clustering knowledge discovery of medical questions text.
关 键 词:LDA fastText模型 特征融合 聚类 问题文本
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.116.42.179