检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王扬 陈智斌[1] 杨笑笑 吴兆蕊 Wang Yang;Chen Zhibin;Yang Xiaoxiao;Wu Zhaorui(Faculty of Science,Kunming University of Science and Technology,Kunming,650000,China)
出 处:《南京大学学报(自然科学版)》2022年第3期420-429,共10页Journal of Nanjing University(Natural Science)
基 金:国家自然科学基金(11761042)。
摘 要:旅行商问题(Traveling Salesman Problem,TSP)是组合最优化问题(Combinatorial Optimization Problem,COP)中的经典问题,多年以来一直被反复研究.近年来深度强化学习(Deep Reinforcement Learning,DRL)在无人驾驶、工业自动化、游戏等领域的广泛应用,显示了强大的决策力和学习能力.结合DRL和图注意力模型,通过最小化路径长度求解TSP问题.改进REINFORCE算法,训练行为网络参数,可以有效地减小方差,防止局部最优;在编码结构中采用位置编码(Positional Encoding,PE),使多重的初始节点在嵌入的过程中满足平移不变性,可以增强模型的稳定性;进一步结合图神经网络(Graph Neural Network,GNN)和Transformer架构,首次将GNN聚合操作处理应用到Transformer的解码阶段,有效捕捉图上的拓扑结构及点与点之间的潜在关系.实验结果显示,模型在100-TSP问题上的优化效果超越了目前基于DRL的方法和部分传统算法.Traveling Salesman Problem(TSP) is a classic problem in Combinatorial Optimization Problem(COP),which has been repeatedly studied for many years. In recent years,Deep Reinforcement Learning(DRL)has been widely applied in driverless,industrial automation,game and other fields,showing strong decision-making and learning ability. In this paper,DRL and graph attention model are combined to solve TSP by minimizing the path length. Specifically,the behavioral network parameters are trained by an improved REINFORCE algorithm to effectively reduce the variance and prevent local optima;Positional Encoding(PE) is used to the encoding structure to make the multiple node satisfy translation invariance during the embedding process and enhance the stability of the model. Further,we combine Graph Neural Network(GNN) and Transformer architecture,and apply GNN aggregate operation processing to transformer decoding stage for the first time,which effectively capture the topological structure of the graph and the potential relationships between points. The experimental results show that the optimization effect of the model on the 100-TSP problem surpasses the current DRL-based methods and some traditional algorithms.
关 键 词:深度强化学习 旅行商问题 图注意力模型 图神经网络 组合最优化
分 类 号:O22[理学—运筹学与控制论] TP18[理学—数学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117