检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]北华大学信息技术与传媒学院,吉林132013 [2]中国科学技术信息研究所,北京100038
出 处:《图书情报工作》2018年第4期113-120,共8页Library and Information Service
基 金:吉林省教育科学“十三五”规划项目“项目教学法在高校基础计算机教学中的应用研究”(项目编号:GH170061)研究成果之一
摘 要:[目的/意义]探索实践以科技报告为文献载体形式的融合主题模型的文本聚类方法,拓展基于科技文献进行技术监测服务的新领域,提出基于科技报告进行语义分析的新方法。[方法/过程]以国家科技报告服务系统中的科技报告为数据源,首先基于LDA主题模型对经过文本预处理的科技报告进行主题挖掘,再基于Ward与K-means相结合的聚类算法对包含主题分布信息的文本向量进行聚类分析,尝试提出一种适合科技报告文档聚类的文本挖掘新方法。[结果/结论]实验结果表明,LDA主题模型能有效准确挖掘科技报告中的主题信息,所提出的Ward与K-means相结合的聚类算法对科技报告的聚类效果也优于其它传统聚类算法。[ Purpose/significance] This paper explores the method of text clustering in the science and technology reports based on the topic model, develops new scientific literature technology monitoring areas, and puts forward a new semantic analysis method based on science and technology reports. [ Method/process] Based on the national science and technology report service system, firstly, it conducted topic mining based on the LDA model after the text preprocessing; secondly, a clustering analysis based on the combination of K-means and Ward was carried out based on the text vector of the abstract containing theme distribution information. A proper text clustering method for the text mining suitable for the science and technical report was proposed. [ Result/conclusion] The experimental results show that the LDA model can be effectively and accurately used in the topic mining of science and technology reports, and the clustering effect of the combination of Ward and K-means proposed in this paper is better than that of other traditional clustering algorithms in sci- ence and technology reports.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.131.131