基于聚合距离参数的改进K-means算法  被引量:28

Improved K-means algorithm with aggregation distance coefficient

在线阅读下载全文

作  者:王巧玲 乔非[1] 蒋友好 WANG Qiaoling;QIAO Fei;JIANG Youhao(School of Electronics and Information Engineering,Tongji University,Shanghai 201804,China)

机构地区:[1]同济大学电子与信息工程学院

出  处:《计算机应用》2019年第9期2586-2590,共5页journal of Computer Applications

基  金:国家自然科学基金重大项目(71690230,71690234)~~

摘  要:针对传统K均值聚类(K-means)算法随机选择初始中心及K值导致的聚类结果不确定且精度不高问题,提出了一种基于聚合距离的改进K-means算法。首先,基于聚合距离参数筛选出优质的初始聚类中心,并将其作用于K-means算法。然后,引入戴维森堡丁指数(DBI)作为算法的准则函数,循环更新聚类直到准则函数收敛,最后完成聚类。改进算法提供了优质的初始聚类中心及K值,避免了聚类结果的随机性。二维数值型仿真数据的聚类结果表明,改进算法在数据样本数达到10 000时仍能保持较好的聚类效果。针对Iris和Seg这两个UCI标准数据集的调整兰德系数,改进算法比传统算法性能分别提高了83.7%和71.0%,最终验证了改进算法比传统算法聚类结果的准确性更高。Initial centers and K value are determined randomly in the traditional K-means algorithm,which makes clustering results uncertain and with low precision.Therefore,an improved K-means algorithm based on aggregation distance was proposed.Firstly,high-quality cluster centers were filtered out based on the aggregation distance coefficient as the initial centers of the K-means algorithm.Secondly,Davies-Bouldin Index(DBI)was introduced as the criterion function of the algorithm,and the clustering was cyclically updated until the criterion function converged.Finally,the clustering was completed.The proposed algorithm provides good initial clustering centers and K value,avoiding the randomness of clustering results.The clustering results of two-dimensional numerical simulation data show that the improved algorithm can still maintain a good clustering effect when the number of data samples reaches 10 000.For the adjusted Rand coefficients of the two UCI standard datasets named Iris and Seg,the improved algorithm respectively improves the performance of clustering by 83.7%and 71.0%compared to the traditional algorithm.It can be seen that the improved algorithm can increase the accuracy of the clustering result compared with the traditional algorithm.

关 键 词:聚合距离参数 聚类中心 聚类评判指标 戴维森堡丁指数(DBI) 数据聚类 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象