基于Hadoop的多核果蝇-Kmeans聚类算法  被引量:1

Multi-kernel FOA-Kmeans Clustering Algorithm Based on Hadoop Platform

在线阅读下载全文

作  者:李小川 刘媛华[1] Li Xiao-chuan;LIU Yuan-hua(Business School,University of Shanghai for Science and Technology,Shanghai 200093,China)

机构地区:[1]上海理工大学管理学院,上海200093

出  处:《软件导刊》2018年第4期51-53,57,共4页Software Guide

基  金:国家自然科学基金项目(11505114);教育部人文社会科学研究一般项目(12YJC630127)

摘  要:针对Kmeans算法对海量数据聚类效率过低的不足,基于Hadoop的分布式架构思想,提出一种多核果蝇-Kmeans聚类算法(MKFOA-Kmeans)。以每次迭代后果蝇位置为聚类中心进行一次Kmeans聚类算法,综合了果蝇优化算法强全局搜索能力以及Kmeans算法强局部搜索能力的优点。MapReduce框架简化了算法执行过程,避免了由于存储空间不足而造成的算法失效。在由普通硬件搭建的Hadoop平台下进行仿真实验,表明MKFOA-Kmeans算法对大数据的聚类准确率高,并且随着数据量的增加,聚类效率优势也愈加明显。In order to overcome the disadvantage of low efficiency of massive data clustering of the Kmeans algorithm,a multi-kernel FOA-Kmeans clustering algorithm based on Hadoop is proposed.Using the positions of artificial flys as the clustering center,the new algorithm combines the strong global searching ability of the fly optimization algorithm and the strong local searching ability of the Kmeans algorithm.The MapReduce programming framework simplifies the execution of the algorithm and avoids the failure of the algorithm due to insufficient storage space of computer.Simulations on Hadoop platform constructed by common computers show that MKFOA-Kmeans algorithm has high accuracy for massive data clustering,and the clustering efficiency becomes more obvious with the increase of data.

关 键 词:大型数据聚类 HADOOP 果蝇算法 多核 Kmeans算法 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象