检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:臧艳辉[1] 席运江[2] 赵雪章[1] ZANG Yan-hui;XI Yun-jiang;ZHAO Xue-zhang(School of Electronic Information,Foshan Polytechnic,Foshan 528137,China;School of Economics and Management,South China University of Technology,Guangzhou 510000,China)
机构地区:[1]佛山职业技术学院电子信息学院,广东佛山528137 [2]华南理工大学经济管理学院,广东广州510000
出 处:《计算机工程与设计》2020年第5期1345-1351,共7页Computer Engineering and Design
基 金:国家自然科学基金面上基金项目(71371077);佛山市科技计划基金项目(2015AB004241)。
摘 要:针对原始k均值法在MapReduce建模中执行时间较长和聚类结果欠佳问题,提出一种基于MapReduce的分治k均值聚类方法。采取分治法处理大数据集,将所要处理的整个数据集拆分为较小的块并存储在每台机器的主存储器中;通过可用的机器传播,将数据集的每个块由其分配的机器独立地进行聚类;采用最小加权距离确定数据点应该被分配的类簇,判断收敛性。实验结果表明,与传统k均值聚类方法和流式k均值聚类方法相比,所提方法用时更短,结果更优。Aiming at the problems of long execution time and poor clustering results of original k-means method in MapReduce modeling,a divide-and-conquer k-means clustering method based on MapReduce was proposed.Divide and conquer was adopted to process large data sets.The whole data set to be processed was broken into smaller blocks and stored in the main memory of each machine.Through available machine propagation,each block of the data set was clustered independently by its allocated machine.The minimum weighted distance was used to determine the class cluster to which the data points should be assigned,and the convergence was judged.Experimental results show that,compared with the traditional k-means clustering method and the streaming k-means clustering method,the proposed method has shorter application time and better results.
关 键 词:数据聚类 基于MapReduce的聚类 分治法 大数据 k均值法
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.193