Spark异构集群负载均衡调度策略  

Load balancing scheduling policies for Spark heterogeneous clusters

在线阅读下载全文

作  者:陶宇炜[1] 谢爱娟[2] TAO Yuwei;XIE Aijuan(Office of IT Services and Big Data,Changzhou University,Changzhou 213164,China;School of Petrochemical Engineering,Changzhou University,Changzhou 213164,China)

机构地区:[1]常州大学信息化建设与大数据处,江苏常州213164 [2]常州大学石油化工学院,江苏常州213164

出  处:《常州大学学报(自然科学版)》2024年第5期61-70,共10页Journal of Changzhou University:Natural Science Edition

基  金:2021年江苏省教育科学"十四五"规划立项课题资助项目(D/2021/01/131);2021年常州大学石油化工学院教育教学研究课题资助项目(SHJY202101)。

摘  要:针对Spark可扩展分布式平台在作业任务调度时,没有考虑异构集群节点计算能力的差异和负载均衡问题,导致系统性能受到影响,文章构建了一种Spark环境下异构集群节点负载均衡调度策略。计算节点根据抽样算法,预测数据分布特征,将数据均衡划分为多个分区,根据异构集群节点静态负载和动态负载权重分配,获得异构集群节点实时负载,动态调度作业任务。最后,在异构集群上,通过Wordcount,TeraSort,K-means三种基准测试比较分析。实验结果表明,该算法运行时间明显减少,异构集群的性能得到提升。Aiming at the problem that the Spark scalable distributed platform does not consider the computing capabilities of heterogeneous cluster nodes and load balance during job task scheduling,which affects the system performance,this paper constructs heterogeneous cluster nodes load balance scheduling policy under the Spark environment.Heterogeneous cluster node predicts the data distribution characteristics according to the sampling algorithm,divides the data into balancing partitions.According to the static load and dynamic load weight distribution,heterogeneous cluster node obtains the real-time load,and dynamically schedules job tasks.Finally,Wordcount,TeraSort,and K-means three benchmark tests were used to compare and analyze during heterogeneous cluster operation.Experimental results show that this algorithm can reduce the execution time significantly,and improve the performance of heterogeneous cluster.

关 键 词:异构性 作业调度 负载均衡 SPARK 

分 类 号:TP302[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象