基于深度强化学习的智能路由技术研究被引量：7

Research on Intelligent Routing Technology Based on Deep Reinforcement Learning

作　　者：黄万伟郑向雨张超钦王苏南[3] 张校辉 HUANG Wanwei;ZHENG Xiangyu;ZHANG Chaoqin;WANG Sunan;ZHANG Xiaohui(College of Software Engineering,Zhengzhou University of Light Industry,Zhengzhou 450001,China;College of Computer and Communication Engineering,Zhengzhou University of Light Industry,Zhengzhou 450001,China;School of Electronic and Commu-nication Engineering,Shenzhen Polytechnic,Shenzhen 518055,China;Henan Xin′an Communication Technology Co.,Ltd.,Zheng-zhou 450001,China)

机构地区：[1]郑州轻工业大学软件学院,河南郑州450001 [2]郑州轻工业大学计算机与通信工程学院,河南郑州450001 [3]深圳职业技术学院电子与通信工程学院,广东深圳518055 [4]河南信安通信技术股份有限公司,河南郑州450001

出　　处：《郑州大学学报（工学版）》2023年第1期44-51,共8页Journal of Zhengzhou University（Engineering Science）

基　　金：国家自然科学基金资助项目(62002382,62072416);河南省重点研发与推广专项(科技攻关)(222102210175,222102210111);2022年河南省专业学位研究生精品教学案例项目(YJS2022AL035)。

摘　　要：针对现有智能路由算法收敛速度慢、平均时延高、带宽利用率低等问题,提出了一种基于深度强化学习(DRL)的多路径智能路由算法RDPG-Route。该算法采用循环确定性策略梯度(RDPG)作为训练框架,引入长短期记忆网络(LSTM)作为神经网络,基于RDPG处理高纬度问题的算法优势,以及LSTM循环核中记忆体的存储能力,将动态变化的网络状态输入神经网络进行训练。算法训练收敛后,将神经网络输出的动作值作为网络链路权重,基于多路径路由策略进行流量划分,以实现网络路由的智能动态调整。最后,将RDPG-Route路由算法分别与ECMP、DRL-TE和DRL-R-DDPG路由算法进行对比。结果表明,RDPG-Route具有较好的收敛性和有效性,相比于其他智能路由算法至少降低了7.2%平均端到端时延,提高了6.5%吞吐量,减少了8.9%丢包率和6.3%的最大链路利用率。To solve the problems of slow convergence speed,high average delay,and low bandwidth utilization of existing intelligent routing algorithms,in this study,a multi-path intelligent routing algorithm RDPG-Route based on deep reinforcement learning(DRL)was proposed.In the algorithm,the recurrent determi-nistic policy gradient(RDPG)was used as the training framework,the long short-term memory(LSTM)was introduced as the neural network.The algorithm advantages of RDPG were used to handle high-latitude problems and the storage capacity of the memory in the LSTM loop core,the dynamically changing network state could be input to the neural network for training.After the algorithm training converged,the action value output by the neural network was used as the net-work link weight,and the traffic was divided based on the multi-path routing strategy to realize the intelligent dy-namic adjustment of the network routing.Finally,RDPG-Route routing algorithm was compared with ECMP,DRL-TE,and DRL-R-DDPG routing algorithms respectively.The results indicated that RDPG-Route had better conver-gence and effectiveness.Compared with other optimal intelligent routing algorithm,RDPG-Route could reduce the average end-to-end delay by at least 7.2%,improve the throughput by 6.5%,and reduce the packet loss rate by 8.9%and the maximum link utilization rate by 6.3%.

关键词：体验质量软件定义网络深度强化学习路由算法循环确定性策略梯度

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的智能路由技术研究被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的智能路由技术研究 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于深度强化学习的智能路由技术研究被引量：7