ACC:一种敏捷的数据中心网络拥塞控制技术  

ACC: agile congestion control in datacenter networks

作  者:袁郭苑 路远 周仁杰 董德尊 彭伟 Guoyuan YUAN;Yuan LU;Renjie ZHOU;Dezun DONG;Wei PENG(School of Computer Science,National University of Defense Technology,Changsha 410028,China;Northwest Institute of Nuclear Technology,Xi'an 710024,China)

机构地区:[1]国防科技大学计算机学院,长沙410028 [2]西北核技术研究所,西安710024

出  处:《中国科学:信息科学》2025年第1期46-63,共18页Scientia Sinica(Informationis)

基  金:国家重点研发计划(批准号:2022YFB4501702)资助项目。

摘  要:面对日益复杂且随时间迅速变化的数据中心网络流量,基于带内网络遥测(in-network telemetry,INT)信息的拥塞控制技术面临诸多挑战.一方面,现有的较为成熟的算法主要依赖接收方的ACK(acknowledge)包反馈INT信息到发送方来获得链路负载信息,由于ACK包的反馈时延是RTT(round trip time)级别的,发送方在流传输的第1个RTT内无法获得INT信息.另一方面,现有的算法普遍缺乏在流传输完成时对网络空闲带宽的抢占,因而网络不可避免地会出现带宽浪费.本文深入探索基于INT信息的拥塞控制技术,有效打破了RTT级别的流控时间屏障,缓和了带宽浪费,提出了一种超精准、高效的拥塞控制算法—ACC(agile congestion control).与现有较为成熟的基于INT信息的拥塞控制算法HPCC(high precision congestion control)相比,ACC格外关注流传输的第1个RTT,在此期间利用交换机将路径拥塞信号传回发送方,能够对网络拥塞做出更敏捷的反应.此外,ACC关注网络中长流对带宽的抢占能力,能够帮助长流提前感知空闲带宽,从而降低网络传输的尾延迟.大量的实验验证了ACC方案的性能,实验结果显示在保持与HPCC一致的高吞吐量的基础上,ACC能够有效降低队列长度和流尾延迟.具体来说,在大规模Clos网络拓扑中,当工作负载为WebServer,负载强度为0.6时,与HPCC相比,ACC能够将平均队列长度减少9.6%,将第95百分位流完成时间减少29.7%.Faced with the increasingly complex and rapidly changing datacenter network traffic,congestion control technology based on INT(in-network telemetry)information faces many challenges.On the one hand,the existing state-of-the-art algorithms mainly rely on the receiver’s ACK(acknowledge)packet to feedback INT information to the sender to obtain link load information.Since the feedback delay of the ACK packet is at the RTT(round trip time)level,the sender cannot obtain INT information within the first RTT of the flow transmission.On the other hand,the existing algorithms generally lack the ability to seize the idle network bandwidth when the flow transmission is completed,so the network would inevitably waste bandwidth.In this paper,we explore INT-based techniques,break the one-RTT barrier efficiently and eliminate bandwidth waste.We come up with an ultra-precise and efficient congestion control algorithm,called ACC(agile congestion control).Compared with the existing state-of-the-art INT-based congestion control algorithm HPCC(high precision congestion control),ACC pays special attention to the first RTT of the flow transmission.During this period,the switch is used to transmit the path congestion signal back to the sender,which can respond more quickly to network congestion.In addition,ACC pays attention to the bandwidth preemption ability of long flows in the network,which can help long flows perceive idle bandwidth in advance,thereby reducing the tail delay of network transmission.Furthermore,we conduct extensive experiments to evaluate the performance of our design.The result shows that while keeping the same high throughput as HPCC,ACC can maintain a lower queue length and effectively reduce the tail delay of the flow.Specifically,in large-scale Clos topology,when the workload is WebServer and the workload intensity is 0.6,compared with HPCC,ACC can reduce the average queue length by 9.6%,shorten the 95th-percentile flow completion time by 29.7%.

关 键 词:数据中心网络拥塞控制 带内网络遥测(INT) 带宽抢占 交换机主机协同 

分 类 号:TP393.06[自动化与计算机技术—计算机应用技术] TP308[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象