面向Dataflow的异构集群混合式资源调度框架研究  被引量:3

Research of Hybrid Resource Scheduling Framework of Heterogeneous Clusters for Dataflow

在线阅读下载全文

作  者:汤小春[1] 赵全 符莹 朱紫钰 丁朝 胡小雪 李战怀[1] TANG Xiao-Chun;ZHAO Quan;FU Ying;ZHU Zi-Yu;DING Zhao;HU Xiao-Xue;LI Zhan-Huai(School of Computer Science,Northwestern Polytechnical University,Xi’an 710129,China)

机构地区:[1]西北工业大学计算机学院,陕西西安710129

出  处:《软件学报》2022年第12期4704-4726,共23页Journal of Software

基  金:国家重点研发计划(2018YFB1003400)。

摘  要:Dataflow模型的使用,使得大数据计算的批处理和流处理融合为一体.但是,现有的针对大数据计算的集群资源调度框架,要么面向流处理,要么面向批处理,不适合批处理与流处理作业共享集群资源的需求.另外,GPU用于大数据分析计算时,由于缺乏有效的CPU-GPU资源解耦方式,降低了资源使用效率.在分析现有的集群资源调度框架的基础上,设计并实现了一种可以感知批处理/流处理应用的混合式资源调度框架HRM.它以共享状态架构为基础,采用乐观封锁协议和悲观封锁协议相结合的方式,确保流处理作业和批处理作业的不同资源要求.在计算节点上,提供CPU-GPU资源的灵活绑定,采用队列堆叠技术,不但满足流处理作业的实时性需求,也减少了反馈延迟并实现了GPU资源的共享.通过模拟大规模作业的调度,结果显示,HRM的调度延迟只有集中式调度框架的75%左右;使用实际负载测试,批处理与流处理共享集群时,使用HRM调度框架,CPU资源利用率提高25%以上;而使用细粒度作业调度方法,不但GPU利用率提高2倍以上,作业的完成时间也能够减少50%左右.The use of the Dataflow model integrates the batch processing and stream processing of big data computing.Nevertheless,the existing cluster resource scheduling frameworks for big data computing are oriented either to stream processing or to batch processing,which are not suitable for batch processing and stream processing jobs to share cluster resources.In addition,when GPUs are used for big data analysis and calculations,resource usage efficiency is reduced due to the lack of effective CPU-GPU resource decoupling methods.Based on the analysis of existing cluster scheduling frameworks,a hybrid resource scheduling framework called HRM is designed and implemented that can perceive batch/stream processing applications.Based on a shared state architecture,HRM uses a combination of optimistic blocking protocols and pessimistic blocking protocols to ensure different resource requirements for stream processing jobs and batch processing jobs.On computing nodes,it provides flexible binding of CPU-GPU resources,and adopts queue stacking technology,which not only meets the real-time requirements of stream processing jobs,but also reduces feedback delays and realizes the sharing of GPU resources.By simulating the scheduling of large-scale jobs,the scheduling delay of HRM is only about 75%of the centralized scheduling framework;by using actual load testing,the CPU resource utilization is increased by more than 25%when batch processing and stream processing share clusters;by using the fine-grained job scheduling method,not only the GPU utilization rate is increased by more than 2 times,the job completion time can also be reduced by about 50%.

关 键 词:数据流模型 批处理 流处理 作业感知 CPU-GPU 队列堆叠 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象