检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴夙慧[1] 成颖[1] 郑彦宁[2] 潘云涛[2]
机构地区:[1]南京大学信息管理系,南京210093 [2]中国科学技术信息研究所,北京100038
出 处:《情报学报》2012年第1期82-94,共13页Journal of the China Society for Scientific and Technical Information
基 金:本文得到国家社科基金项目“中文学术信息检索系统相关性集成研究”(项目批准号:10CTQ027),教育部人文社会科学研究规划基金项目“面向用户的相关性标准及其应用研究”(项目批准号:07JA870006),中国科学技术信息研究所合作研究项目的资助.
摘 要:K—means算法是一种应用广泛的聚类算法,但是存在初始聚类中心和K值选取的难题。本文提出了一种基于学术文献同被引分析的初始聚类中心和K值选取的K—means改进算法。该算法属于两步聚类算法,首先对学术文献进行同被引分析,得到同被引矩阵,然后基于同被引矩阵进行层次聚类。算法记录每次迭代过程中被聚为一类的学术文献间的距离以及两次迭代间的距离差,当两次迭代的距离差取得最大值时取其聚类数作为第二步K-means算法的K值,并且将此时的类中心作为第二步K—means算法的初始聚类中心。第二步聚类则依据文献内容实现K-means算法。实验通过与经典K—means算法和基于凝聚层次聚类算法的改进K—means算法的对比,证明了本文提出的改进的K—means算法具备更优的聚类效果。K-means algorithm is a widely-used clustering algorithm. The main problem of the algorithm is the determination of the optimal number of clusters and the selection of initial cluster centers. In this paper, a novel algorithm based on co-citation analysis is proposed. This algorithm is divided into two steps. The first step is to do co-citation analysis in the academic literature set, and get the matrix of co-citation, and run hierarchical clustering algorithm based on the matrix. In each iteration, distance of academic literature in a cluster and the difference of the distance between two iterations are recorded. In the end of first step, the value of K and the centers of every cluster are selected for the second step when the maximum of the difference is achieved. The second part of the research is to execute the K-means algorithm based on the content of academic literature. Experimental results show that the clustering quality is improved.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145