基于端到端深度强化学习求解有能力约束的车辆路径问题  

Solving capacitated vehicle routing problems based onend to end deep reinforcement learning

在线阅读下载全文

作  者:葛斌[1] 田文智 夏晨星 秦望博 Ge Bin;Tian Wenzhi;Xia Chenxing;Qin Wangbo(School of Computer Science&Engineering,Anhui University of Science&Technology,Huainan Anhui 232001,China;Institute of Energy,Hefei Comprehensive National Science Center,Hefei 230031,China)

机构地区:[1]安徽理工大学计算机科学与工程学院,安徽淮南232001 [2]合肥综合性国家科学中心能源研究院,合肥230031

出  处:《计算机应用研究》2024年第11期3245-3250,共6页Application Research of Computers

基  金:国家重点研发计划资助项目(2020YFB1314103)。

摘  要:有能力约束的车辆路径问题(CVRP)是现阶段供应链应用最常见的问题模型,现多采用启发式算法求解。但随着问题规模增大,启发式算法求解速度慢且无法保证解的质量。提出端到端深度强化学习(DRL)网络框架对CVRP进行研究。首先利用边聚合图注意力网络编码器(EGATE)对车辆路径规划问题的图表示进行特征嵌入编码;然后设计多头注意力解码器(MAD)进行解码,并提出多解码策略以增加解的空间多样性;接着利用带回滚基线的基线REINFORCE算法对端到端网络模型进行训练,基线可自适应性更新以提升模型训练效果,并利用奖励函数归一化和Adam优化器对算法进行优化。最后通过对不同规模问题的实验以及与其他算法进行对比,验证了所提出端到端DRL框架的可行性与有效性,经过训练的模型在CVRPLIB公共数据集上的平均求解时间仅需0.189 s即可得到较优解。The capacitated vehicle routing problem(CVRP)is the most prevalent problem model in supply chain applications at present,and researchers often use heuristic algorithms to solve it,but the solution speed is slow and the quality of the solution cannot be guaranteed.This paper proposed an end-to-end deep reinforcement learning(DRL)network framework to study the CVRP problem.Firstly,it used the edge graph attention network encoder(EGATE)to perform feature embedding encoding on the graph representation of VRP.Then,it designed a multi-head attention decoder(MAD)to decode the encoded graph representation.Additionally,it proposed a multi-decoding strategy to enhance the spatial diversity of the solutions.Continuing with the training of the end-to-end network model using the baseline REINFORCE algorithm with a rollout baseline,the adaptive updating of the baseline was employed to enhance the effectiveness of model training.Additionally,reward function normalization and optimization using Adam optimizer were utilized to further improve the algorithm.Finally,this paper validated the feasibility and effectiveness of the proposed end-to-end DRL framework through experiments on problems of different scales,comparing its performance against other algorithms.The average solution time of the trained model on the CVRPLIB public dataset is only 0.189 s to obtain a better solution.

关 键 词:车辆路径问题 路径规划 端到端模型 深度强化学习 基线REINFORCE算法 

分 类 号:TP399[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象