深度强化学习结合图注意力模型求解TSP问题被引量：4

Deep reinforcement learning combined with graph attention model to solve TSP

作　　者：王扬陈智斌[1] 杨笑笑吴兆蕊 Wang Yang;Chen Zhibin;Yang Xiaoxiao;Wu Zhaorui(Faculty of Science,Kunming University of Science and Technology,Kunming,650000,China)

机构地区：[1]昆明理工大学理学院,昆明650000

出　　处：《南京大学学报（自然科学版）》2022年第3期420-429,共10页Journal of Nanjing University（Natural Science）

基　　金：国家自然科学基金(11761042)。

摘　　要：旅行商问题(Traveling Salesman Problem,TSP)是组合最优化问题(Combinatorial Optimization Problem,COP)中的经典问题,多年以来一直被反复研究.近年来深度强化学习(Deep Reinforcement Learning,DRL)在无人驾驶、工业自动化、游戏等领域的广泛应用,显示了强大的决策力和学习能力.结合DRL和图注意力模型,通过最小化路径长度求解TSP问题.改进REINFORCE算法,训练行为网络参数,可以有效地减小方差,防止局部最优;在编码结构中采用位置编码(Positional Encoding,PE),使多重的初始节点在嵌入的过程中满足平移不变性,可以增强模型的稳定性;进一步结合图神经网络(Graph Neural Network,GNN)和Transformer架构,首次将GNN聚合操作处理应用到Transformer的解码阶段,有效捕捉图上的拓扑结构及点与点之间的潜在关系.实验结果显示,模型在100-TSP问题上的优化效果超越了目前基于DRL的方法和部分传统算法.Traveling Salesman Problem(TSP) is a classic problem in Combinatorial Optimization Problem(COP),which has been repeatedly studied for many years. In recent years,Deep Reinforcement Learning(DRL)has been widely applied in driverless,industrial automation,game and other fields,showing strong decision-making and learning ability. In this paper,DRL and graph attention model are combined to solve TSP by minimizing the path length. Specifically,the behavioral network parameters are trained by an improved REINFORCE algorithm to effectively reduce the variance and prevent local optima;Positional Encoding(PE) is used to the encoding structure to make the multiple node satisfy translation invariance during the embedding process and enhance the stability of the model. Further,we combine Graph Neural Network(GNN) and Transformer architecture,and apply GNN aggregate operation processing to transformer decoding stage for the first time,which effectively capture the topological structure of the graph and the potential relationships between points. The experimental results show that the optimization effect of the model on the 100-TSP problem surpasses the current DRL-based methods and some traditional algorithms.

关键词：深度强化学习旅行商问题图注意力模型图神经网络组合最优化

分类号：O22[理学—运筹学与控制论] TP18[理学—数学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度强化学习结合图注意力模型求解TSP问题被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度强化学习结合图注意力模型求解TSP问题 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

深度强化学习结合图注意力模型求解TSP问题被引量：4