结合节点计算能力的MapReduce负载均衡方法被引量：2

Load balancing in MapReduce combined with computing capacity of nodes

作　　者：胡林发付晓东[1,2] 刘骊刘利军[1] HU Linfa;FU Xiaodong;LIU Li;LIU Lijun(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Computer Technology Application,Kunming University of Science and Technology,Kunming 650500,China)

机构地区：[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]昆明理工大学云南省计算机技术应用重点实验室,昆明650500

出　　处：《重庆邮电大学学报（自然科学版）》2023年第6期1154-1163,共10页Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)

基　　金：国家自然科学基金项目(62362043,61962030);“兴滇英才支持计划”项目(KKXY202203008);云南省科技计划项目(202204BQ040010,202205AF150003)。

摘　　要：MapReduce是大数据计算领域广泛使用的编程模型,默认的Hash分区方法易导致数据倾斜,使各计算节点负载不均衡,影响了整体计算性能并造成了大量集群资源浪费。针对这一问题,提出一种结合节点计算能力的分区方法。通过运行一个独立的抽样作业,利用Reservoir抽样算法抽取待处理数据并统计样本里关键字的位置和频次;根据关键字的统计数据制定分区策略,使各分区负载与节点计算能力达到平衡,同时优化网络开销;以全量数据为输入运行计算作业,并采用已制定的分区策略对中间数据进行分区,得出计算作业的运行结果。实验结果表明,方法使各节点负载更加均衡,可明显提升计算作业执行效率。MapReduce is a widely used programming model in big data computing,offering significant benefits for intensive computing tasks.However,the default Hash partitioning method is prone to data skew and unbalanced load among nodes,impacting overall computing performance and wasting cluster resources.In this paper,a partitioning method combining node computing capacity is proposed to solve the load balancing problem.Firstly,an independent sampling job is executed using the Reservoir sampling algorithm to extract the data to be processed.The location and frequency of keywords in the sample are then counted.Secondly,the partition strategy is formulated to balance the load of each partition with the computing capacity of nodes according to the statistics of the keywords,and the network overhead is optimized simultaneously.Finally,the whole dataset is used as input to run the computation job,and the established partitioning strategy is used to partition the intermediate data,resulting in the final output of the computation job.Experimental results show that the proposed method achieves more balanced load among nodes and significantly improves the efficiency of computing job execution.

关键词：负载均衡数据倾斜大数据抽样算法

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合节点计算能力的MapReduce负载均衡方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合节点计算能力的MapReduce负载均衡方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

结合节点计算能力的MapReduce负载均衡方法被引量：2