启发式k-means聚类算法的改进研究  被引量:2

Study on Improvement of Heuristic k-means Clustering Algorithm

在线阅读下载全文

作  者:殷丽凤[1] 栗庆杰 YIN Lifeng;LI Qingjie(School of Software,Dalian Jiaotong University,Dalian 116024,China)

机构地区:[1]大连交通大学软件学院,辽宁大连116028

出  处:《大连交通大学学报》2024年第2期115-119,共5页Journal of Dalian Jiaotong University

基  金:国家自然科学基金项目(61771087)。

摘  要:启发式k-means聚类算法通过在k-means第一次迭代后查看附近的集群来预测每个数据点可能会被划分到的集群子集,有效地加快了算法的运行速度。但由于启发式算法存在随机选择初始聚类中心以及无法有效识别数据集中离群点的缺陷,导致聚类结果的误差平方和较大并且轮廓系数偏小。针对这一问题,提出了CHk-means算法,该算法引入仔细播种方法,克服了启发式k-means算法随机选择初始聚类中心带来的局部最优解问题;该算法引入局部异常因子LOF算法对离群点进行检测,降低了离群点数据对聚类结果的影响。在多个数据集上对3种算法进行对比试验,结果表明CHk-means算法可有效降低聚类结果的误差平方和,增强聚类的轮廓系数,使聚类质量得到明显改善。The heuristic k-means algorithm predicts the subset of clusters to each data point which is likely to be classified by looking at nearby clusters after the first iteration of k-means,effectively speeding up the operation of the algorithm.However,due to the shortcomings of the heuristic algorithm in randomly selecting the initial clustering center and being unable to effectively identify outliers in the data set,the sum of squared errors in the clustering results is large,and the silhouette coefficient is small.To address this problem,the CHk-means algorithm is proposed.This algorithm introduces a careful seeding method to overcome the local optimal solution problem caused by the heuristic k-means algorithm random selection of the initial cluster center.This algorithm introduces the local outlier factor LOF algorithm to detect outliers,reducing the impact of outlier data on clustering results.Comparative experiments were conducted on three algorithms on multiple data sets.The results show that the CHk-means algorithm can effectively reduce the sum of square errors of clustering results,enhance the silhouette coefficient of clustering,and significantly improve the clustering quality.

关 键 词:聚类算法 K-MEANS 启发式算法 仔细播种 局部异常因子 离群点 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象