检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡林发 付晓东[1,2] 刘骊 刘利军[1] HU Linfa;FU Xiaodong;LIU Li;LIU Lijun(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Computer Technology Application,Kunming University of Science and Technology,Kunming 650500,China)
机构地区:[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]昆明理工大学云南省计算机技术应用重点实验室,昆明650500
出 处:《重庆邮电大学学报(自然科学版)》2023年第6期1154-1163,共10页Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基 金:国家自然科学基金项目(62362043,61962030);“兴滇英才支持计划”项目(KKXY202203008);云南省科技计划项目(202204BQ040010,202205AF150003)。
摘 要:MapReduce是大数据计算领域广泛使用的编程模型,默认的Hash分区方法易导致数据倾斜,使各计算节点负载不均衡,影响了整体计算性能并造成了大量集群资源浪费。针对这一问题,提出一种结合节点计算能力的分区方法。通过运行一个独立的抽样作业,利用Reservoir抽样算法抽取待处理数据并统计样本里关键字的位置和频次;根据关键字的统计数据制定分区策略,使各分区负载与节点计算能力达到平衡,同时优化网络开销;以全量数据为输入运行计算作业,并采用已制定的分区策略对中间数据进行分区,得出计算作业的运行结果。实验结果表明,方法使各节点负载更加均衡,可明显提升计算作业执行效率。MapReduce is a widely used programming model in big data computing,offering significant benefits for intensive computing tasks.However,the default Hash partitioning method is prone to data skew and unbalanced load among nodes,impacting overall computing performance and wasting cluster resources.In this paper,a partitioning method combining node computing capacity is proposed to solve the load balancing problem.Firstly,an independent sampling job is executed using the Reservoir sampling algorithm to extract the data to be processed.The location and frequency of keywords in the sample are then counted.Secondly,the partition strategy is formulated to balance the load of each partition with the computing capacity of nodes according to the statistics of the keywords,and the network overhead is optimized simultaneously.Finally,the whole dataset is used as input to run the computation job,and the established partitioning strategy is used to partition the intermediate data,resulting in the final output of the computation job.Experimental results show that the proposed method achieves more balanced load among nodes and significantly improves the efficiency of computing job execution.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7