检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴广建 章剑林[1] 袁丁 WU Guang-jian;ZHANG Jian-lin;YUAN Ding(Hangzhou Normal College,Alibaba Business University,Hangzhou 311100)
机构地区:[1]杭州师范大学阿里巴巴商学院
出 处:《软件》2019年第5期167-170,共4页Software
摘 要:典型的K-means算法利用手肘法选择合适的K值在实际项目中应用的较多,但是手肘法获取K值自动性低,以及面对海量数据的处理,效率上也有待提高。提出利用手肘法关系图初始点和末尾点连接的关系直线,求K值范围下直线y值与误差平方和的最大差值的方法,最大差值对应的K值为手肘法的最优肘点,由于手肘法需要多次迭代以及数据集稠密度对关系图的影响较小,提出利用数据集预抽样并且将程序部署在spark平台之上的方式自动获取手肘法的肘点K值,这样不仅根据此方法自动获取K-means最优K值而且提高了大数据集的处理效率。The typical k-means algorithm selecting the appropriate K value by elbow method is widely used in practical projects.However,the automation of the elbow method to obtain K value is low,and the efficiency in the face of massive data processing needs to be improved.This paper proposes a method to find the maximum difference between the line y value and the sum of squared errors in the range of K by using the line connecting the initial point and the end point of the elbow normal diagram.Since the elbow method requires multiple iterations and the data set density has little impact on the diagram,it is proposed to automatically obtain K value of the elbow method by pre-sampled data and deploying the program on spark platform.In this way,the optimal k-means k value can be acquired automatically according to this method,and the processing efficiency of large data sets can be improved.
关 键 词:K-MEANS算法 聚类K值 手肘法 误差平方和 肘点
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.176