MapReduce下融合PAM算法与仔细播种的多样本归并聚类

Multi-samples Merging Clustering Algorithms Combining PAM Algorithm and Careful Seeding Based on MapReduce

出　　处：《小型微型计算机系统》2017年第10期2281-2285,共5页Journal of Chinese Computer Systems

基　　金：江苏省自然科学基金项目(BK20140165)资助;国家留学基金委赞助项目(201308320030)资助

摘　　要：传统PAM(Partitioning Around Medoids)算法时间复杂度较高,处理大数据集时效率低下.近年来,越来越多研究者使用MapReduce模型来使聚类算法获得更高的性能,然而MapReduce模型在算法迭代过程中需要多次重启任务、从文件系统读取数据和数据洗牌,影响数据处理效率.本文提出两种基于MapReduce的融合PAM算法与仔细播种的聚类处理模型,在保持PAM算法聚类有效性的同时,在算法性能上获得显著提高.性能试验和聚类有效性实验的结果表明本文提出的方法达到了预期的效果且具有很好的可扩展性.Common PAM （Partitioning Around Modoids ） algorithm works inefficiently for large-scale data set due to its time complexity. Recently,more and more researchers apply MapReduce model to obtain high performance for clustering algorithms. However,MapReduce model needs repeated times of restarting jobs ,reading data from file system and data shuffling which will have impacts on data processing efficiency. In this paper, we propose two clustering processing models based on MapReduce model, PAM algorithm and careful seeding to obtain high performance and maintain cluster validity of PAM in the same time. The performance evaluation and clustering validation experiments demonstrate that the methods we have proposed are efficient, robust and scalable.

关键词：PAM聚类算法 MAPREDUCE 概率抽样性能聚类有效性

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

MapReduce下融合PAM算法与仔细播种的多样本归并聚类

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

MapReduce下融合PAM算法与仔细播种的多样本归并聚类

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索