检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杜鹃[1] 张卓[2] 曹建春[1] Du Juan;Zhang Zhuo;Cao Jianchun(Yellow River Conservancy Technical Institute,Kaifeng 475004,Henan,China;Zhengzhou University,Zhengzhou 450001,Henan,China)
机构地区:[1]黄河水利职业技术学院,河南开封475004 [2]郑州大学,河南郑州450001
出 处:《计算机应用与软件》2021年第11期288-294,313,共8页Computer Applications and Software
基 金:河南省科技攻关计划项目(192102210102)。
摘 要:提出一种基于快速无偏分层图抽样的MapReduce负载平衡方法。将聚类算法融合到MapReduce连接操作中,提出MapReduce并行聚类连接算法的实现方法;根据聚类结果动态调整抽样率的无偏分层图抽样算法,从而实现连接操作目标数据的准确、平衡抽样。通过合成数据集和真实数据集下的数据处理实验,与Hash连接算法及基于NS抽样的聚类算法进行对比,验证了所提出的算法方案在不同数据倾斜程度下都具有良好的负载平衡性能,其运行效率也没有因为新采样算法的采用而受到影响。This paper proposes a MapReduce load balancing method based on fast unbiased stratified graph sampling. The clustering algorithm was fused into MapReduce connection operation, and the implementation method of MapReduce parallel clustering connection algorithm was proposed. The unbiased stratified graph sampling algorithm was dynamically adjusted according to the clustering results to achieve accurate and balanced sampling of the target data of connection operation. The composite data set and the real data set were synthesized. Data processing experiments under the set were compared with traditional hash connection algorithm and clustering algorithm based on NS sampling. The results show that the proposed algorithm has good load balancing performance under different data skewness, and its operation efficiency is not affected by the adoption of the new sampling algorithm.
关 键 词:大数据 数据倾斜 负载平衡 无偏分层图抽样 MapReduce平台 Hash连接算法 NS抽样聚类
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7