检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中北大学计算机与控制工程学院,太原030051
出 处:《计算机测量与控制》2015年第3期842-846,共5页Computer Measurement &Control
基 金:国家自然科学基金(50976108);山西省自然科学基金(2012011011-3)
摘 要:为了解决MapReduce机制下算法通信时间占用比过高实际应用价值受限的问题,提出了基于Hadoop二阶段并行c-Means聚类算法;首先,采用成员管理协议方式实现成员管理与MapReduce降低操作的同步化方法,改进MapReduce机制下的MPI通讯管理方法;其次,实行典型个体组降低操作代替全局个体降低操作,并定义二阶段缓冲算法,通过第一阶段的缓冲进一步降低第二阶段MapReduce操作的数据量,尽可能降低大数据带来的对算法负面影响;通过仿真实验表明该算法在处理大数据上的性能表现较为优异;该算法在大规模数据集上的并行率和加速比都优于小型数据集上的表现,说明了该算法能够实时根据数据量的大小对自身进行调整。According to the problem of high complexity of MPI communication strategies under the framework of traditional MapReduce, put forward a kind of secondary parallel fuzzy e--Means clustering algorithm. Firstly, improve MPI communication management method under the MapReduce mechanism, synchronization use membership management protocol mode to realize the management and members of MapReduce reduce the operation. Secondly, A typical individual operation instead of global individual operation, and define the two stage buffer algorithm, the big data to further reduce the second stage MapReduce operation through the first stage of the buffer, reduce the data brought about negative im- pact as much as possible. Through the simulation experiments show that the algorithm in dealing with the big data on the performance is more outstanding. The algorithm in parallel rate and speed ratio on the big data, were superior to the small data , shows that the algorithm can real-- time adjustments according to the size of the data of its own.
关 键 词:二阶段 模糊c—Means 大数据 数据聚类 HADOOP
分 类 号:TP312[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.118.28.11