面向边缘集群内AI数据流的双平面调度模型  

Dual-plane Scheduling Model for AI Data Flow in Edge Cluster

在线阅读下载全文

作  者:吴明杰 陈庆奎[2] WU Ming-jie;CHEN Qing-kui(School of Management,University of Shanghai for Science and Technology,Shanghai 200093,China;School of Optical-Electrical Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)

机构地区:[1]上海理工大学管理学院,上海200093 [2]上海理工大学光电信息与计算机工程学院,上海200093

出  处:《小型微型计算机系统》2021年第6期1332-1339,共8页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61572325,60970012)资助;高等学校博士学科点专项科研博导基金(20113120110008)资助;上海重点科技攻关项目(16DZ1203603,19DZ1208903)资助;上海市工程中心建设项目(GCZX14014)资助;上海智能家居大规模物联共性技术工程中心项目(GCZX14014)资助;上海市一流学科建设项目(XTKX2012)资助;沪江基金研究基地专项(C14001)资助.

摘  要:随着边缘AI的兴起,边缘GPU集群被广泛用于大量并发AI数据流的实时处理.AI数据流不仅需要在集群内传输,还需要在计算节点上排队和计算.为了减少响应时间,研究者们旨在通过优秀的调度算法减少任务的排队等待时间,而忽略了调度命令的传输耗时.在传统的单平面框架下,由于调度命令与数据在同一个物理线路上传输,在集群内传输数据量很高时,容易因调度命令的传输延迟和丢弃而调度失败,甚至造成集群性能下降或者故障.本文提出一种边缘集群内AI数据流的双平面调度模型.首先,提出一种双平面的框架,将调度命令和数据传输从物理上分离,互不影响.其次,在数据平面使用基于DPDK的多网卡并行通信技术以提高数据传输的效率和带宽,针对AI数据流设计和实现了基于消息的可靠传输协议.最后,提出兼顾计算节点网络负载和计算负载的任务迁移调度模型,旨在降低集群内数据流的排队延时.在不出现消息丢失的情况,本文的双平面架构传输方案能够增加集群数据流容量约30%;在不出现任务丢弃的情况下,本文的双平面架构调度模型能够增加集群数据流容量约15%.With the rise of edge AI,edge GPU clusters are widely used for real-time processing of a large number of concurrent AI data flows.AI data flow not only needs to be transmitted in the cluster,but also needs to be queued and calculated on the computing node.In order to reduce the response time,the researchers aim to reduce the queuing waiting time of tasks through excellent scheduling algorithms,while ignoring the time-consuming transmission of scheduling commands.In the traditional single-plane framework because the scheduling command and data are transmitted on the same physical line,when the amount of data transmitted in the cluster is very high,it is easy to fail scheduling due to the transmission delay and discard of the scheduling command,or even cause the cluster performance to decline or fault.This paper proposes a dual-plane scheduling model for AI data flow in edge clusters.Firstly,this paper proposes a two-plane framework designed to physically separate scheduling commands and data transmission without affecting each other.Secondly,the DPDK-based multi-VIC parallel communication technology is used in the data plane to improve the efficiency and bandwidth of data transmission,and a reliable message-based transmission protocol is designed and implemented for AI data flows.Finally,a task migration scheduling model considering network load and computing load of computing nodes is proposed,aiming to reduce the queuing delay of the data flow in the cluster.In the case of no message loss,the dual-plane architecture transmission scheme of this paper can increase the cluster data flow capacity by about 30%;without the task dropping,the dual-plane architecture scheduling model of this paper can increase the cluster data flow capacity by about 15%.

关 键 词:边缘计算 边缘集群 DPDK AI数据流 双平面架构 任务迁移调度 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象