一种启发式确定聚类数方法  被引量:7

Heuristic Method of Determining the Number of Clusters

在线阅读下载全文

作  者:卢建云[1,2] 朱庆生[1,3] 吴全旺[1] LU Jian-yun;ZHU Qing-sheng;WU Quan-wang(School of Computer, Chongqing University, Chongqing 400044, China;Chongqing Key Laboratory of Software Theory & Technology, Chongqing University, Chongqing 400044, China;School of Software, Chongqing College of Electronic Engineering, Chongqing 401331, China)

机构地区:[1]重庆大学计算机学院,重庆400044 [2]重庆电子工程职业学院软件学院,重庆401331 [3]重庆大学软件理论与技术重庆市重点实验室,重庆400044

出  处:《小型微型计算机系统》2018年第7期1381-1385,共5页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61272194)资助

摘  要:聚类分析是数据挖掘领域中最重要的任务之一,目前许多聚类算法已经被成功应用到图像聚类、文本聚类、信息检索、社交网络等领域.但面对结构复杂,分布不均衡的数据集时,确定数据集的最佳聚类数目显得尤为困难.因此,本文针对结构复杂、分布不均衡的数据集提出了一种启发式最佳聚类数确定的方法.首先,构建随机游走模型对数据集中的点进行重要性排序,通过k-最近邻距离图谱确定重要数据点的个数,由此排除噪声点和不重要的点对类之间以及类内密度变化的影响.其次,通过设计的启发式规则(k-最近邻链间距和k-最近邻链最近邻间距)构建决策图确定最佳聚类数目并识别出聚类代表点.最后,通过最近距离传播算法进行聚类.实验表明该方法可以快速准确地找到最佳聚类个数,同时,本文提出的聚类算法与流行的聚类算法相比取得了比较好的聚类结果.Cluster analysis is one of the important tasks in data mining. Currently,many clustering algorithms are successfully applied in image clustering,text clustering,information retrieval,social networks,etc. When the dataset is complex with different sizes,shapes and densities,it is difficult to find the best number of clusters. In this paper,we propose a heuristic method of determining the best number of clusters. First,we build a random walk model to sort the data points by their global scores,and then k_dist graph is used to determine the number of important data points in order to reduce the influence of noises and border points. Second,we develop two heuristic rules( the gap of k-nearest neighbors chain and the nearest neighbor gap of k-nearest neighbors chain) to determine the best number of clusters and the representative points of cluster by decision graph. Finally,clustering results are obtained by nearest distance propagation algorithm. Experimental results show that the proposed method can find the correct number of clusters quickly and the proposed clustering algorithm achieves comparable clustering performance with the popular clustering algorithms.

关 键 词:聚类分析 聚类数目 启发式规则 随机游走模型 k-最近邻链 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象