基于Hadoop二阶段并行模糊c-Means聚类算法  

HADOOP-BASED TWO-STAGE PARALLEL FUZZY C-MEANS CLUSTERING ALGORITHM

在线阅读下载全文

作  者:胡吉朝[1] 黄红艳[1] Hu Jichao;Huang Hongyan(School of Information Engineering, Shijiazhuang University of Economics, Shijiazhuang 050031 ,Hebei, China)

机构地区:[1]石家庄经济学院信息工程学院,河北石家庄050031

出  处:《计算机应用与软件》2016年第6期282-286,共5页Computer Applications and Software

基  金:河北省科技厅科技攻关项目(13210130)

摘  要:针对Mapreduce机制下算法通信时间占用比过高,实际应用价值受限的情况,提出基于Hadoop二阶段并行c-Means聚类算法用来解决超大数据的分类问题。首先,改进Mapreduce机制下的MPI通信管理方法,采用成员管理协议方式实现成员管理与Mapreduce降低操作的同步化;其次,实行典型个体组降低操作代替全局个体降低操作,并定义二阶段缓冲算法;最后,通过第一阶段的缓冲进一步降低第二阶段Mapreduce操作的数据量,尽可能降低大数据带来的对算法负面影响。在此基础上,利用人造大数据测试集和KDD CUP 99入侵测试集进行仿真,实验结果表明,该算法既能保证聚类精度要求又可有效加快算法运行效率。Aiming at the problem of too high occupancy of communication time and limited applying value of the algorithm under the mechanism of Mapreduce, we put forward a Hadoop-based two-stage parallel c-Means clustering algorithm to deal with the problem of extra-large data classification. First, we improved the MPI communication management method in Mapreduce mechanism, and used membership management protocol mode to realise the synchronisation of members management and Mapreduce reducing operation. Secondly,we implemented typical individuals group reducing operation instead of global individual reducing operation, and defined the two-stage buffer algorithm. Finally, through the buffer in first stage we further reduced the data amount of Mapreduce operation in second stage, and reduced the negative impact brought about by big data on the algorithm as much as possible. Based on this, we carried out the simulation by using artificial big data test set and KDD CUP 99 invasion test data. Experimental result showed that the algorithm could both guarantee the clustering precision requirement and speed up effectively the operation efficiency of algorithm.

关 键 词:二阶段 模糊c-Means 大数据 聚类 并行 入侵检测 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象