基于K-means算法的最佳聚类数研究  被引量:15

Research on the best clustering number based on K-means algorithm

在线阅读下载全文

作  者:王艳娥 梁艳[1] 司海峰[1] 丁心安 WANG Yan’e;LIANG Yan;SI Haifeng;DING Xin’an(School of Technology,Xi’an Siyuan University,Xi’an 710038,China)

机构地区:[1]西安思源学院理工学院,陕西西安710038

出  处:《电子设计工程》2020年第24期52-56,共5页Electronic Design Engineering

基  金:陕西省教育厅科学研究计划项目(18JK1100);陕西省高等教育科学研究项目(XGH19236)。

摘  要:针对聚类算法在实现的过程中需要预先设定最终聚类数目的问题,提出了基于同类全部样本的类内紧密度和类间离差度的一种新聚类有效性指标,通过该指标能够有效地确定数据集的最佳聚类簇数。在确定最佳聚类数的过程中采用K-means算法,针对K-means算法随机选择初始聚类中心的缺陷,提出以欧式距离度量样本相似度,基于样本方差,选出方差最小的前K个样本作为初始聚类中心,避免噪声点成为初始聚类中心,使得选择的初始聚类中心位于样本集稠密区域,Kmeans聚类的结果稳定有效。使用优化K-means算法和新的聚类有效性指标确定数据集的簇数,通过在UCI数据集和人工模拟数据集上测试,证明文本算法在球形且噪声点较少的样本集中,能够有效地找出最佳的类数且算法运行速度快。Aiming at the problem that the final number of clusters should be set in advance during the implementation of the clustering algorithm,a new clustering effectiveness index based on the intra-class tightness and inter-class dispersion of all samples of the same kind is proposed in this paper,which can effectively determine the optimal cluster number of data sets.Used in the process of the optimum clustering number K-means algorithm,in view of the K-means algorithm random initial clustering center of the defect,sample similarity of an Euclidean distance measure is put forward,based on the sample variance,select the minimum variance of K samples as the initial clustering center before,to avoid noise point as the initial clustering center,makes the choice of initial clustering center is located in the sample set is populated area,K-means clustering results of stable and effective.The optimized K-means algorithm and the new clustering validity index are used to determine the number of clusters of the data set.By testing on the UCI data set and the artificial simulation data set,it is proved that the text algorithm can effectively find the best number of classes in the spherical sample set with fewer noise points and the algorithm runs fast.

关 键 词:K-MEANS 聚类数 有效性指标 聚类分析 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象