基于MapReduce的分治k均值聚类方法  被引量:8

Divide and conquer k-means clustering method based on MapReduce

在线阅读下载全文

作  者:臧艳辉[1] 席运江[2] 赵雪章[1] ZANG Yan-hui;XI Yun-jiang;ZHAO Xue-zhang(School of Electronic Information,Foshan Polytechnic,Foshan 528137,China;School of Economics and Management,South China University of Technology,Guangzhou 510000,China)

机构地区:[1]佛山职业技术学院电子信息学院,广东佛山528137 [2]华南理工大学经济管理学院,广东广州510000

出  处:《计算机工程与设计》2020年第5期1345-1351,共7页Computer Engineering and Design

基  金:国家自然科学基金面上基金项目(71371077);佛山市科技计划基金项目(2015AB004241)。

摘  要:针对原始k均值法在MapReduce建模中执行时间较长和聚类结果欠佳问题,提出一种基于MapReduce的分治k均值聚类方法。采取分治法处理大数据集,将所要处理的整个数据集拆分为较小的块并存储在每台机器的主存储器中;通过可用的机器传播,将数据集的每个块由其分配的机器独立地进行聚类;采用最小加权距离确定数据点应该被分配的类簇,判断收敛性。实验结果表明,与传统k均值聚类方法和流式k均值聚类方法相比,所提方法用时更短,结果更优。Aiming at the problems of long execution time and poor clustering results of original k-means method in MapReduce modeling,a divide-and-conquer k-means clustering method based on MapReduce was proposed.Divide and conquer was adopted to process large data sets.The whole data set to be processed was broken into smaller blocks and stored in the main memory of each machine.Through available machine propagation,each block of the data set was clustered independently by its allocated machine.The minimum weighted distance was used to determine the class cluster to which the data points should be assigned,and the convergence was judged.Experimental results show that,compared with the traditional k-means clustering method and the streaming k-means clustering method,the proposed method has shorter application time and better results.

关 键 词:数据聚类 基于MapReduce的聚类 分治法 大数据 k均值法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象