检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨延庆 袁华兵 YANG Yanqing;YUAN Huabing(Division of Information Technology,Xi'an Medical University,Xi'an 710021)
出 处:《计算机与数字工程》2020年第7期1564-1567,1765,共5页Computer & Digital Engineering
基 金:陕西省青年科学基金项目(编号:71701160);西安医学院教学改革研究项目(编号:2018JG-07)资助。
摘 要:模糊K-means算法是一种能够定量地确定事物亲属关系的软聚类算法,由于该算法在大规模数据的分析和处理中存在的不足,因此提出一种基于MapReduce模型的并行化实现。首先在Map函数的输出传递给其他节点的Reduce函数之前,改进Combine函数设计,增加本地中间结果处理,减少通信开销,以提高MapReduce任务计算速度。然后在Hadoop分布式计算平台上对多组规模不同的数据集进行测试。实验表明,基于MapReduce的并行模糊K-means算法适合大规模数据的分析和处理,而且执行速度提高了约1.9倍,聚类效果更为显著。The fuzzy K-means algorithm is a kind of important soft clustering algorithm which can quantitatively determine the relation of different objects.In view of the shortcomings of fuzzy K-means algorithm in large-scale data processing,therefore,this paper puts forward parallel implementation based on MapReduce programming model.First,in order to improve the computing speed of the MapReduce task,it can improve the design of the Combine function,add the local intermediate result processing and reduce the communication overhead before the output of the Map function is passed to the Reduce function of other nodes.Then,several sets of data sets with different sizes are tested on the Hadoop distributed computing platform.The experiments show that the parallel fuzzy K-means algorithm based on MapReduce is suitable for the analysis and processing of large-scale data,and the execution speed is increased by about 1.9 times,and the clustering effect is more remarkable.
关 键 词:模糊K-means MAPREDUCE模型 Combine函数 HADOOP平台
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.46