基于关键词重要性和近邻传播聚类的主题分析研究  被引量:27

Theme Analysis Based on Keyword Importance and Affinity Propagation Clustering

在线阅读下载全文

作  者:李海林[1,2] 万校基[1] 林春培[1] Li Hailin;Wan Xiaoji;Lin Chunpei(College of Business Administration,Huaqiao University,Quanzhou 362021;Research Center of Applied Statistics and Big Data,Huaqiao University,Xiamen 361021)

机构地区:[1]华侨大学工商管理学院,泉州362021 [2]华侨大学现代应用统计与大数据研究中心,厦门361021

出  处:《情报学报》2018年第5期533-542,共10页Journal of the China Society for Scientific and Technical Information

基  金:国家自然科学基金项目"高维时间序列数据聚类分析及应用研究"(71771094);福建省社会科学规划项目"基于时间序列数据挖掘的期刊参考文献和引证文献分析研究"(FJ2017B065)

摘  要:鉴于传统科学计量方法存在共现分析缺少考虑关键词重要性和主题分析手段不能自适应地抽取核心主题等问题,本文提出一种基于关键词重要性和近邻传播聚类的主题分析方法。该方法依据大多数作者的潜在行为会按照与研究内容相关性的强弱顺序提供论文关键词,计算关键词在每个文献中的重要程度,构建主要关键词之间的相似性矩阵,结合能够反馈最优簇成员代表性结果的近邻传播聚类实现核心主题的提取与分析。本研究对图书情报类某刊物2012-2016年期间的文献关键词进行数据挖掘,使用新方法实现了基于重要性度量的主要关键词聚类,分析和研究了主要关键词和核心主题的演化趋势。提出的方法不仅能够考虑关键词重要性和自动识别核心主题,还可以为文献主题分析提供新的数据挖掘方法,也能有效提高期刊和学科等相关领域的主题识别效果。In view of the fact that co-occurrence analysis lacks consideration of keyword importance and theme analysis in such a way that it does not adaptively extract the core themes in traditional scientific measurement methods,this paper proposes a theme analysis method based on keyword importance and affinity propagation clustering.Based on probable behavior of most authors,the method collects the keywords of theses according to the strength or weakness of the relevance to the corresponding research content,computes the importance measure of the keywords in the papers,and constructs the similarity matrix of the keywords.The extraction and analysis of the core theme is achieved through combining the method with affinity propagation clustering that can retrieve the best representative member of the cluster.In this study,the keywords in a specialized journal of literature and information during the period of 2012 to 2016 were collected,and keyword clustering based on importance measurement was implemented.The evolutionary trends of keywords and core themes were analyzed and studied.The method proposed in this study not only considers the keyword importance and automatically identifies core themes,but also provides new data mining methods for thematic document analysis and effectively improves the topic recognition effect in related fields such as journals and other disciplines.

关 键 词:主题分析 关键词重要性 近邻传播聚类 核心主题 

分 类 号:G353.1[文化科学—情报学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象