基于多起点和Mask策略的深度强化学习算法求解覆盖旅行商问题

Deep reinforcement learning algorithm based on multi-start and Mask strategy for solving the covering salesman problem

作　　者：方伟接中冰陆恒杨张涛[3] FANG Wei;JIE Zhong-bing;LU Heng-yang;ZHANG Tao(International Joint Laboratory on Artificial Intelligence of Jiangsu Province,Jiangnan University,Wuxi 214122,China;Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence,Jiangnan University,Wuxi 214122,China;China Ship Scientific Research Center,Wuxi 214082,China)

机构地区：[1]江南大学江苏省人工智能国际合作联合实验室,江苏无锡214122 [2]江南大学江苏省模式识别与计算智能工程实验室,江苏无锡214122 [3]中国船舶科学研究中心,江苏无锡214082

出　　处：《控制与决策》2024年第4期1160-1166,共7页Control and Decision

基　　金：国家自然科学基金项目(62073155,62002137,62106088,62206113);船舶总体性能创新研究开放基金项目(22422213)。

摘　　要：覆盖旅行商问题(covering salesman problem,CSP)是旅行商问题的变体,在防灾规划、急救管理中有着广泛应用.由于传统方法求解问题实例耗时严重,近年来深度神经网络被提出用于解决该类组合优化问题,在求解速度和泛化性上有明显的优势.现有基于深度神经网络求解CSP的方法求解质量较低,特别在大规模实例上与传统的启发式方法相比存在较大差距.针对上述问题,提出一种新的基于深度强化学习求解CSP的方法,由编码器对输入特征进行编码,提出新的Mask策略对解码器使用自注意力机制构造解的过程进行约束,并提出多起点策略改善训练过程、提高求解质量.实验结果表明,所提方法对比现有基于深度神经网络的求解方法进一步缩小了最优间隙,同时有着更高的样本效率,在不同规模和不同覆盖类型的CSP中展现出更强的泛化能力,与启发式算法相比在求解速度上有10~40倍的提升.The covering salesman problem(CSP)is a variant of the traveling salesman problem,which is widely used in disaster planning and emergency management.Since the traditional solvers are time-consuming for solving problem instances,deep neural networks have been proposed for solving this type of combinatorial optimization problem in recent years,which have obvious advantages in terms of solving speed and generalization.However,existing methods based on deep neural networks for solving CSP problems have low solution quality,especially in large-scale instances,compared with traditional heuristics.Therefore,we propose a new method based on deep reinforcement learning to solve the CSP problem,in which the encoder encodes the input features,propose a new Mask strategy to constrain the decoder to construct a solution using the self-attention mechanism,and propose a multi-start strategy to improve the training process and the solution quality.Experimental results show that the proposed method further reduces the optimality gap compared with existing deep neural network-based solution methods,has higher sample efficiency,shows stronger generalization ability in CSP tasks of different sizes and coverage types,and has a 10-40 times improvement in solution speed compared with heuristic algorithms.

关键词：覆盖旅行商深度强化学习组合优化多起点 Mask策略

分类号：TP399[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多起点和Mask策略的深度强化学习算法求解覆盖旅行商问题

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多起点和Mask策略的深度强化学习算法求解覆盖旅行商问题

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索