基于K-means的手肘法自动获取K值方法研究  被引量:69

Automatically Obtaining K Value Based on K-means Elbow Method

在线阅读下载全文

作  者:吴广建 章剑林[1] 袁丁 WU Guang-jian;ZHANG Jian-lin;YUAN Ding(Hangzhou Normal College,Alibaba Business University,Hangzhou 311100)

机构地区:[1]杭州师范大学阿里巴巴商学院

出  处:《软件》2019年第5期167-170,共4页Software

摘  要:典型的K-means算法利用手肘法选择合适的K值在实际项目中应用的较多,但是手肘法获取K值自动性低,以及面对海量数据的处理,效率上也有待提高。提出利用手肘法关系图初始点和末尾点连接的关系直线,求K值范围下直线y值与误差平方和的最大差值的方法,最大差值对应的K值为手肘法的最优肘点,由于手肘法需要多次迭代以及数据集稠密度对关系图的影响较小,提出利用数据集预抽样并且将程序部署在spark平台之上的方式自动获取手肘法的肘点K值,这样不仅根据此方法自动获取K-means最优K值而且提高了大数据集的处理效率。The typical k-means algorithm selecting the appropriate K value by elbow method is widely used in practical projects.However,the automation of the elbow method to obtain K value is low,and the efficiency in the face of massive data processing needs to be improved.This paper proposes a method to find the maximum difference between the line y value and the sum of squared errors in the range of K by using the line connecting the initial point and the end point of the elbow normal diagram.Since the elbow method requires multiple iterations and the data set density has little impact on the diagram,it is proposed to automatically obtain K value of the elbow method by pre-sampled data and deploying the program on spark platform.In this way,the optimal k-means k value can be acquired automatically according to this method,and the processing efficiency of large data sets can be improved.

关 键 词:K-MEANS算法 聚类K值 手肘法 误差平方和 肘点 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象