基于MapReduce框架下的K-means聚类算法的改进  被引量:7

Improved K-means Clustering Algorithm Based on MapReduce Framework

在线阅读下载全文

作  者:宋阳 石鸿雁[1] SONG Yang;SHI Hong-yan(School of Science, Shenyang University of Technology, Shenyang 110870, China)

机构地区:[1]沈阳工业大学理学院

出  处:《计算机与现代化》2019年第8期28-32,43,共6页Computer and Modernization

基  金:国家自然科学基金资助项目(61074005);辽宁省高等学校优秀科技人才支持计划项目(LR2012005)

摘  要:针对K-means算法处理海量数据的聚类效果和速率,提出一种基于MapReduce框架下的K-means算法分布式并行化编程模型。首先对K-means聚类算法初始化敏感的问题,给出一种新的相异度函数,根据数据间的相异程度来确定k值,并选取相异度较小的点作为初始聚类中心,再把K-means算法部署在MapReduce编程模型上,通过改进MapReduce编程模型来加快K-means算法处理海量数据的速度。实验表明,基于MapReduce框架下改进的K-means算法与传统的K-means算法相比,准确率及收敛时间方面均有所提高,并且并行聚类模型在不同数据规模和计算节点数目上具有良好的扩展性。Aiming at the clustering effect and speed of K-means algorithm in processing massive data, a distributed parallel programming model of K-means clustering algorithm based on MapReduce framework is proposed. First, for the sensitive initialization problem of K-means clustering algorithm, a new dissimilarity function is given, according to the degree of dissimilarity between data, k value is determined, and the point with smaller dissimilarity is selected as the initial clustering center, then the K-means algorithm is deployed on the MapReduce programming model, K-means algorithm speeds up to deal with massive data by improving MapReduce programming model. Experiments show that both accuracy and convergence time of the improved K-means algorithm under MapReduce are improved compared with the traditional K-means algorithm, and the parallel clustering model has good expansivity in different data scales and the number of calculated nodes.

关 键 词:K-MEANS算法 相异度函数 MAPREDUCE模型 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象