分布式k-means聚类算法的改进被引量：3

Optimization of k-means clustering algorithm in hadoop distributed computing framework

出　　处：《广西大学学报（自然科学版）》2014年第5期1060-1065,共6页Journal of Guangxi University（Natural Science Edition）

基　　金：广西自然科学基金资助项目(2013GXNSFAA253003)

摘　　要：经典的分布式k-means聚类算法随机选取初始聚类中心,进行多次的迭代,容易使得聚类效率低,网络通信量大,而且聚类结果不稳定。针对这些问题,提出一种改进的分布式k-means聚类算法。该算法通过划分数据集,计算属性最密集的k个数据块作为聚类中心,以确保聚类中心的代表性,进而减少算法的迭代计算次数,提高聚类效率。通过在Hadoop分布式平台上进行实验,结果表明改进算法能减少迭代次数和收敛时间。Classic distributed k-means clustering algorithm randomly selects the initial clustering centers.With many times iterations, it is easy to make low clustering efficiency, heavy network traf-fic, and the unstable clustering results.To solve these problems, an improved distributed k-means clustering algorithm is put forward.The algorithm selects the initial clustering centers by partitioning the data set, and calculating k classification blocks of most intensive attribute, to ensure the cluste-ring centers＆#39;representative, and then it reduces the number of iterations and improves the efficiency of clustering.Through the experiments on the Hadoop distributed platform, the results show that the improved algorithm can reduce the number of iteration and convergence time.

关键词：K-MEANS聚类分布式算法 MapReduce计算模型聚类中心

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

分布式k-means聚类算法的改进被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

分布式k-means聚类算法的改进 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

分布式k-means聚类算法的改进被引量：3