检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵悦阳[1] 崔雷[2] ZHAO Yueyang;CUI Lei(Library of Shengjing Hospital of China Medical University,Shenyang 110004,China;School of Health Management,China Medical University,Shenyang 110122,China)
机构地区:[1]中国医科大学附属盛京医院图书馆,沈阳110004 [2]中国医科大学医学健康管理学院,沈阳110122
出 处:《医学信息学杂志》2024年第3期58-64,共7页Journal of Medical Informatics
基 金:辽宁省社会科学规划基金资助项目(项目编号:L20BTQ003)。
摘 要:目的/意义弥补医学文本语义表示方面的不足,实现PubMed数据库检索结果聚类。方法/过程采用Jaccard系数和TF-IDF构建融合矩阵方法,建立短语间、文档间、短语与文档内容间的相似性关系融合矩阵,训练聚类算法,将PubMed数据库检索结果集合分组,随后生成类别标签,描述每一类簇文档的含义。结果/结论基于融合矩阵的聚类效果较好,提取出描述类别的高频词能很好地区分类别含义,对检索结果文本聚类任务有效。Purpose/Significance To solve the deficiencies in the semantic representation of medical texts,and to realize the clustering of the retrieval results of the PubMed database.Method/Process The paper proposes a method to construct a fusion matrix by using the Jaccard coefficient and TF-IDF.Similarity relations between phrases,documents,and the contents of phrases and documents are combined to construct a fusion matrix,and several clustering algorithms are trained to group a collection of documents from the PubMed database.Category annotations are created to describe the meaning of each category of clustered documents.Result/Conclusion Experimental results show that the fusion matrix-based clustering is superior in grouping the document sets,and the extracted high-frequency words in the category descriptions distinguish the meanings of the categories well,so the fusion matrix design is effective for clustering descriptions of academic texts.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.120