检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:宋阳 石鸿雁[1] SONG Yang;SHI Hong-yan(School of Science, Shenyang University of Technology, Shenyang 110870, China)
机构地区:[1]沈阳工业大学理学院
出 处:《计算机与现代化》2019年第8期28-32,43,共6页Computer and Modernization
基 金:国家自然科学基金资助项目(61074005);辽宁省高等学校优秀科技人才支持计划项目(LR2012005)
摘 要:针对K-means算法处理海量数据的聚类效果和速率,提出一种基于MapReduce框架下的K-means算法分布式并行化编程模型。首先对K-means聚类算法初始化敏感的问题,给出一种新的相异度函数,根据数据间的相异程度来确定k值,并选取相异度较小的点作为初始聚类中心,再把K-means算法部署在MapReduce编程模型上,通过改进MapReduce编程模型来加快K-means算法处理海量数据的速度。实验表明,基于MapReduce框架下改进的K-means算法与传统的K-means算法相比,准确率及收敛时间方面均有所提高,并且并行聚类模型在不同数据规模和计算节点数目上具有良好的扩展性。Aiming at the clustering effect and speed of K-means algorithm in processing massive data, a distributed parallel programming model of K-means clustering algorithm based on MapReduce framework is proposed. First, for the sensitive initialization problem of K-means clustering algorithm, a new dissimilarity function is given, according to the degree of dissimilarity between data, k value is determined, and the point with smaller dissimilarity is selected as the initial clustering center, then the K-means algorithm is deployed on the MapReduce programming model, K-means algorithm speeds up to deal with massive data by improving MapReduce programming model. Experiments show that both accuracy and convergence time of the improved K-means algorithm under MapReduce are improved compared with the traditional K-means algorithm, and the parallel clustering model has good expansivity in different data scales and the number of calculated nodes.
关 键 词:K-MEANS算法 相异度函数 MAPREDUCE模型
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.116.36.48