检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈锦前 郭少勇[1] 刘畅[1] 亓峰[1] 邱雪松[1] CHEN Jinqian;GUO Shaoyong;LIU Chang;QI Feng;QIU Xuesong(State Key Laboratory of Networking and Switching Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China)
机构地区:[1]北京邮电大学网络与交换技术全国重点实验室,北京100876
出 处:《通信学报》2025年第2期1-17,共17页Journal on Communications
基 金:国家自然科学基金资助项目(No.62322103);北京市自然科学基金资助项目(No.4232009);中央高校基本科研业务费专项资金资助项目(No.2023ZCTH11)。
摘 要:针对智算中心集群间交互频繁造成网络拥塞频发导致智能业务实时性难以保障的问题,以数据处理单元(DPU)为核心载体构建了深度强化学习算法驱动的拥塞控制模型,利用剪枝与量化融合的方式对模型进行压缩,并通过知识蒸馏方法生成高效梯度增强决策树,实现调速动作与网络实时状态的精准匹配。仿真结果表明,所提机制在泛化能力和控制效果方面均优于现有方法,在多个压力测试场景中提升网络有效吞吐率与公平性指标JAIN10.8%和8.9%以上,降低P99端到端时延与丢包率17.31%和11.47%以上,降低并行计算场景下数据流传输任务完成时间11.23%以上,且具备应对网络状态突变的快速响应能力。Addressing the issue of frequent network congestion due to high-frequency interactions between intelligent computing center clusters,which compromised the real-time performance of intelligent services,a congestion control model driven by deep reinforcement learning algorithm was constructed with the data processing unit(DPU).By integrating pruning and quantization,the model was lightweighted.Moreover,the model was transformed into the efficient gradient-boosted decision tree through knowledge distillation method,allowing for precise matching of control actions with real-time network conditions.Simulation results show that the proposed mechanism is demonstrated to outperform existing methods in terms of generalization capability and control effectiveness.The network’s effective throughput and fairness index JAIN are increased by more than 10.8%and 8.9%,respectively,across various experimental scenarios.P99 end-to-end latency and packet loss rate are reduced by more than 17.31%and 11.47%,respectively.The completion time of data flow transfer tasks in parallel computing scenarios is decreased by more than 11.23%.Additionally,rapid response capabilities to sudden changes in network status are exhibited.
关 键 词:拥塞控制 多智能体深度强化学习 智算中心网络 远程直接内存访问网络 数据处理单元
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.109.97