K-均值算法的初始化改进与聚类质量评估  被引量:1

Initialization improvement and clustering quality evaluation of K-means algorithm

在线阅读下载全文

作  者:何选森 何帆 于海澜 HE Xuansen;HE Fan;YU Hailan(School of Information Technology and Engineering,Guangzhou College of Commerce,Guangzhou 511363,China;College of Information Science and Engineering,Hunan University,Changsha 410082,China;School of Management and Economics,Beijing Institute of Technology,Beijing 100081,China)

机构地区:[1]广州商学院信息技术与工程学院,广东广州511363 [2]湖南大学信息科学与工程学院,湖南长沙410082 [3]北京理工大学管理与经济学院,北京100081

出  处:《西安工程大学学报》2024年第6期114-123,共10页Journal of Xi’an Polytechnic University

基  金:广东省普通高校重点领域专项(2021ZDZX1035);广东省科技创新战略专项资金(pdjh2022b0598)。

摘  要:为解决K-均值算法随机初始化的问题,提出了相应的改进方案。通过特征标准化和主成分分析(principal component analysis, PCA)实现数据降维;以最远质心和最小-最大距离规则确定算法的初始质心。为获得数据固有的聚类数量,采用经验法则和肘部法,并用轮廓分析评价聚类质量。仿真结果表明:其他算法平均的λ检验统计量是本方案的2.72倍,而且改进后的聚类误差下降了6.04%。In order to solve the problem of random initialization of K-means algorithm,an improved scheme was proposed.By standardizing the features of data and using principal component analysis(PCA),data dimensionality reduction was achieved.The initial centroids of the algorithm were determined by the farthest centroid and the min-max distance rule.To obtain the inherent number of clusters in the data,empirical rules and elbow method were used,and silhouette analysis was used to evaluate the clustering quality.The simulation results show that the average λ test statistic of other algorithms is 2.72 times that of this scheme,and the improved clustering error is reduced by 6.04%.

关 键 词:K-均值算法 主成分分析 最远质心选择 最小-最大距离规则 经验法则 肘部法 轮廓分析 聚类 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象