检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡健[1,2] 徐锴滨 毛伊敏 HU Jian;XU Kaibin;MAO Yimin(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou,Jiangxi 341000,China;Department of Information Engineering,College of Applied Science,Jiangxi University of Science and Technology,Ganzhou,Jiangxi 341000,China)
机构地区:[1]江西理工大学信息工程学院,江西赣州341000 [2]江西理工大学应用科学学院信息工程系,江西赣州341000
出 处:《计算机科学与探索》2020年第12期2094-2107,共14页Journal of Frontiers of Computer Science and Technology
基 金:国家重点研发计划,No.2018YFC1504705;国家自然科学基金,No.41562019;江西省教育厅科技项目,Nos.GJJ151528,GJJ151531。
摘 要:针对大数据下基于密度的聚类算法中存在的数据网格划分不合理,聚类结果准确度不高以及并行化效率较低等问题,提出了基于MapReduce和加权网格信息熵的DBWGIE-MR算法。首先提出自适应网格划分策略(ADG)来划分网格单元;其次提出邻居网格扩展策略(NE)用于构建每个数据分区的加权网格,以此提高聚类效果;同时提出加权网格信息熵策略(WGIE)来计算网格密度以及密度聚类算法的ε邻域和核心对象,使密度聚类算法更适用于加权网格;接着结合MapReduce计算模型,提出并行计算局部簇算法(COMCOREMR),从而加快获取局部簇;最后提出了基于并查集的并行合并局部簇算法(MECORE-MR),用于加快合并局部簇的收敛速度,提升了基于密度的聚类算法对局部簇合并的效率。实验结果表明,DBWGIE-MR算法的聚类效果更佳,且在较大规模的数据集下算法的并行化性能更好。Aiming at the problems of unreasonable division of data gridding,low accuracy of clustering results and low efficiency of parallelization in big data clustering algorithm based on density,this paper proposes a densitybased clustering algorithm by using weighted grid and information entropy based on MapReduce,named DBWGIEMR.Firstly,an adaptive division grid(ADG)strategy is proposed to divide the cell of grid adaptively.Secondly,a weighted grid construction strategy,neighboring expand(NE)which can strengthen relevance between grids is designed to improve the accuracy of clustering.Meanwhile,based on weighted grid and information entropy(WGIE),a density calculation strategy is designed to calculate the density of grid.In addition,theε-neighborhood and core object of density-based clustering algorithm are recalculated,which is suitable for weighted grid.Then,COMCORE-MR(core clusters computing algorithm based on MapReduce)algorithm is proposed to compute the local clusters of clustering algorithm in parallel.Finally,based on disjoint-set and MapReduce,MECORE-MR(merge core cluster by using MapReduce)algorithm is proposed to speed up the convergence speed of merging local clusters,which improves the local clusters merging efficiency of density-based clustering algorithm.The experimental results show that the DBWGIE-MR algorithm has better clustering results and performs better parallelization in large scale dataset.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.144.255.53