深度强化学习解决动态旅行商问题  被引量:5

Solving dynamic traveling salesman problem by deep reinforcement learning

在线阅读下载全文

作  者:陈浩杰 范江亭 刘勇[1] CHEN Haojie;FAN Jiangting;LIU Yong(College of Computer Science and Technology,Heilongjiang University,Harbin Heilongjiang 150006,China)

机构地区:[1]黑龙江大学计算机科学与技术学院,哈尔滨150006

出  处:《计算机应用》2022年第4期1194-1200,共7页journal of Computer Applications

基  金:黑龙江省自然科学基金资助项目(LH2020F043);黑龙江大学研究生创新科研项目(YJSCX2021-197HLJU)。

摘  要:针对未设计启发式算法的组合优化问题设计统一的解决方案已成为机器学习领域的一个研究热点,目前成熟的技术主要针对静态的组合优化问题,但是对于加入动态变化的组合优化问题还没有得到充分的解决。为了解决以上问题,提出一个将多头注意力机制与分层强化学习结合来求解动态图上的旅行商问题的轻量级模型Dy4TSP。首先,用以多头注意力机制为基础的预测网络处理来自图卷积神经网络的节点表征向量输入;然后,借助分布式强化学习算法训练来快速地预估图中每个节点被输出作为最优解的可能性,使得模型在不同的可能性中全面探索问题的最优解决方案空间;最后,训练后的模型将实时地生成满足具体目标奖励函数的动作决策序列。该模型在3个组合优问题上进行了评估,实验结果表明,该模型在经典旅行商系列问题中解的质量比开源求解器LKH3高0.15~0.37个单位,明显优于带有边嵌入的图注意网络(EGATE)等最新的算法;并且在其他的动态旅行商问题中可以达到0.1~1.05的最优路径差距,结果也略胜一筹。Designing a unified solution to the combinational optimization problems of undesigned heuristic algorithms has become a research hotspot in the field of machine learning.At present,mature technologies are mainly aiming at static combinatorial optimization problems,but the combinational optimization problems with dynamic changes are not fully solved.In order to solve above problems,a lightweight model called Dy4TSP(Dynamic model for Traveling Salesman Problems)was proposed,which combined multi-head-attention mechanism with distributed reinforcement learning to solve the traveling salesman problem on a dynamic graph.Firstly,the node representation vector from graph convolution neural network was processed by the prediction network based on multi-head-attention mechanism.Then,the distributed reinforcement learning algorithm was used to quickly predict the possibility that each node in the graph was output as the optimal solution,and the optimal solution space of the problems in different possibilities were comprehensively explored.Finally,the action decision sequence which could meet the specific reward function in real time was generated by the trained model.The model was evaluated on three typical combinatorial optimization problems,and the experimental results showed that the solution qualities of the proposed model are 0.15 to 0.37 units higher than those of the open source solver LKH3(Lin-Kernighan-Helsgaun 3),and are significantly better than those of the latest algorithms such as Graph Attention Network with Edge Embedding(EGATE).The proposed model can reach an optimal path gap of 0.1 to 1.05 in other dynamic traveling salesman problems,and the results are slightly better.

关 键 词:组合优化问题 机器学习 强化学习 深度学习 图卷积神经网络 分布式学习 多智能体 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象