基于学术文献同被引分析的K-means算法改进研究被引量：4

Improvement of K-means Algorithm Based on Co-citation Analysis

机构地区：[1]南京大学信息管理系,南京210093 [2]中国科学技术信息研究所,北京100038

出　　处：《情报学报》2012年第1期82-94,共13页Journal of the China Society for Scientific and Technical Information

基　　金：本文得到国家社科基金项目“中文学术信息检索系统相关性集成研究”（项目批准号：10CTQ027）,教育部人文社会科学研究规划基金项目“面向用户的相关性标准及其应用研究”（项目批准号：07JA870006）,中国科学技术信息研究所合作研究项目的资助.

摘　　要：K—means算法是一种应用广泛的聚类算法，但是存在初始聚类中心和K值选取的难题。本文提出了一种基于学术文献同被引分析的初始聚类中心和K值选取的K—means改进算法。该算法属于两步聚类算法，首先对学术文献进行同被引分析，得到同被引矩阵，然后基于同被引矩阵进行层次聚类。算法记录每次迭代过程中被聚为一类的学术文献间的距离以及两次迭代间的距离差，当两次迭代的距离差取得最大值时取其聚类数作为第二步K-means算法的K值，并且将此时的类中心作为第二步K—means算法的初始聚类中心。第二步聚类则依据文献内容实现K-means算法。实验通过与经典K—means算法和基于凝聚层次聚类算法的改进K—means算法的对比，证明了本文提出的改进的K—means算法具备更优的聚类效果。K-means algorithm is a widely-used clustering algorithm. The main problem of the algorithm is the determination of the optimal number of clusters and the selection of initial cluster centers. In this paper, a novel algorithm based on co-citation analysis is proposed. This algorithm is divided into two steps. The first step is to do co-citation analysis in the academic literature set, and get the matrix of co-citation, and run hierarchical clustering algorithm based on the matrix. In each iteration, distance of academic literature in a cluster and the difference of the distance between two iterations are recorded. In the end of first step, the value of K and the centers of every cluster are selected for the second step when the maximum of the difference is achieved. The second part of the research is to execute the K-means algorithm based on the content of academic literature. Experimental results show that the clustering quality is improved.

关键词：K—means算法 K值初始聚类中心同被引文献聚类

分类号：F830.4[经济管理—金融学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于学术文献同被引分析的K-means算法改进研究被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于学术文献同被引分析的K-means算法改进研究 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于学术文献同被引分析的K-means算法改进研究被引量：4