基于密度和中心指标的Canopy二分K-均值算法优化  被引量:6

A Canopy bisecting K-Means algorithm based on density and central index

在线阅读下载全文

作  者:沈郭鑫 蒋中云[2] SHEN Guo-xin;JIANG Zhong-yun(College of Information,Shanghai Ocean University,Shanghai 201306;College of Information,Shanghai Jian Qiao University,Shanghai 201306,China)

机构地区:[1]上海海洋大学信息学院,上海201306 [2]上海建桥学院信息技术学院,上海201306

出  处:《计算机工程与科学》2022年第2期372-380,共9页Computer Engineering & Science

基  金:上海市属高校应用型本科试点专业基金(Z32004-17-84)。

摘  要:针对二分K-均值算法由于随机选取初始中心及人为定义聚类数而造成的聚类结果不稳定问题,提出了基于密度和中心指标的Canopy二分K-均值算法SDC_Bisecting K-Means。首先计算样本中数据密度及其邻域半径;然后选出密度最小的数据并结合Canopy算法的思想进行聚类,将得到的簇的个数及其中心作为二分K-均值算法的输入参数;最后在二分K-均值算法的基础上引入指数函数和中心指标对原始样本进行聚类。利用UCI数据集和自建数据集进行模拟实验对比,结果表明SDC_Bisecting K-Means不仅使得聚类结果更精确,同时算法的运行速度更快、稳定性更好。Aiming at the problem of unstable clustering results caused by the random selection of initial centers and artificially defining the number of clusters in the bisecting K-means algorithm,a Canopy bisecting K-means algorithm based on density and center index is proposed.Firstly,the algorithm calculates the data density in the sample and its neighborhood radius.Secondly,the data with the smallest density are selected and the ideas of the Canopy algorithm is combined for clustering.The number of clusters and cluster centers are obtained as the input parameters of the bisecting K-means algorithm.Finally,based on the bisecting K-means algorithm,the exponential function and central index are introduced to cluster the original samples.UCI data set and self-built data set were used to compare simulation experiments.The results show that the algorithm not only makes the clustering results more accurate and faster,but also has better stability.

关 键 词:聚类 二分K-均值算法 密度 邻域半径 指数函数 中心指标 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象