一种面向大数据集的粗粒度并行聚类算法研究被引量：7

A Coarse-Grained Clustering Unit Based Parallel Algorithm for Big Data Set

出　　处：《小型微型计算机系统》2014年第10期2370-2374,共5页Journal of Chinese Computer Systems

基　　金：国家自然科学基金项目(61303029)资助;武汉市科技创新团队项目(201307020402005)资助;中央高校基本科研专项基金项目(2013-IV-054;145210007)资助

摘　　要：随着大数据时代的到来,面对数据量剧增,传统的聚类算法将面临极大的挑战.为了提高聚类算法的效率,本文基于Hadoop平台设计与实现了并行化的Partitioning Around Medoid聚类算法,并从优化聚类单元和聚类中心的角度,结合视觉聚类的核心思想提出了粗粒度聚类单元策略(Coarse-Grained Clustering Unit Strategy).通过多组实验比较,结果表明,在粗粒度聚类单元策略的优化下算法在运行效率,计算能力等方面提高6%以上,所实现的并行算法具有良好的加速比,扩展比和伸缩率.研究结果为以后的大数据集下的聚类分析奠定了基础.With the explosive growth of the data and the arriving of the big-data era, traditional clustering algorithms face the grea-t challenges. In order to improve the efficiency of clustering algorithms,in this paper we study the parallel partitioning around me-doid algorithm on the Hadoop platform and proposed a coarse-grained clustering unit strategy combined with the core idea of vis-ual cluste- ring based on optimization clustering uni-t and clustering center. The experiment results show that the parallel algorithm h-as a better perfomance in speed-up ratio, expansion rat-io and flex ratio. When utilizing the strategy, the performance of the algorith-m improved more than 6%. The method we proposed ca-n make contribute to studying the clustering analysis of big data in the futu-re.

关键词：云计算大数据 PAM 粗粒度 HADOOP

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种面向大数据集的粗粒度并行聚类算法研究被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种面向大数据集的粗粒度并行聚类算法研究 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种面向大数据集的粗粒度并行聚类算法研究被引量：7