数据流计算环境下的集群资源管理技术  被引量:3

State-of-art research of cluster resource management in dataflow computing model

在线阅读下载全文

作  者:汤小春[1] 符莹 丁朝 毛安琪 李战怀[1] TANG Xiaochun;FU Ying;DING Zhao;MAO Anqi;LI Zhanhuai(School of Computer Science,Northwestern Polytechnical University,Xi'an 710129,China)

机构地区:[1]西北工业大学计算机学院,陕西西安710129

出  处:《大数据》2020年第3期87-100,共14页Big Data Research

基  金:国家重点研发计划基金资助项目(No.2018YFB1003400)。

摘  要:以集群为基础的高性能计算的发展经历了3个阶段的演化,即计算子系统与存储子系统的分离、计算子系统与存储子系统的融合以及以数据并行为基础的dataflow编程模型。随着Spark、Flink等数据流编程模型在大数据计算领域的广泛使用,计算作业类型千变万化,如何保证各种数据流计算作业对集群资源的共享使用是集群资源管理的核心,也是降低基础设施成本的主要手段。分析集群资源管理的历史变化,从数据流编程模型的角度出发,对HoD、集中式、双层调度、分布式以及混合式管理展开了深入的探索,介绍了其各自的优缺点以及应用现状,为数据流计算环境下的集群资源管理和调度的使用或者研发提供一定的参考和借鉴。The development of cluster-based high-performance computing has undergone three stages of evolution.With the widespread use of dataflow programming models such as Spark and Flink in the field of big data computing,how to ensure the fair share with the cluster resources by various dataflow computing applications is extremely important.It is also a main means to reduce the cost of infrastructures.As the drawbacks of traditional cluster resource management have becoming increasingly apparent in dataflow computing model,many alternative cluster resource management,including HoD,centralized scheduling,two-level scheduling,distributed scheduling,and hybrid scheduling management,have been proposed in recent years.Their respective advantages and disadvantages were introduced,and a certain reference for the uses or researches in development of cluster resource management and scheduling in a dataflow computing environment was provided.

关 键 词:数据流模型 集群资源 调度框架 大数据 

分 类 号:TP31[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象