求解外卖配送问题的深度强化学习算法

Deep reinforcement learning approach for solving takeout delivery problem

作　　者：张旭阳刘勇[1] 马良[1] Zhang Xuyang;Liu Yong;Ma Liang(Business School,University of Shanghai for Science&Technology,Shanghai 200093,China)

机构地区：[1]上海理工大学管理学院,上海200093

出　　处：《计算机应用研究》2025年第1期205-213,共9页Application Research of Computers

基　　金：教育部人文社会科学研究青年基金资助项目(21YJC630087)。

摘　　要：以最小化骑手费用效益比为优化目标,采用最小比率旅行商问题对外卖配送问题进行建模。针对目前算法在求解该问题时计算精度低、算法稳定性差等问题,设计一种基于深度强化学习的DRL-MFA算法。首先,定义外卖配送问题的马尔可夫决策模型来模拟智能体与环境的交互过程;其次,在编码阶段设计多特征聚合嵌入子层,实现特征间的优势互补并提高模型对非线性问题的建模能力;最后,在解码阶段通过注意力机制和指针网络计算解的概率分布,采用策略梯度算法对网络模型进行训练。通过经典算例和长春市仿真案例的相关实验分析,结果表明该算法能够有效地求解外卖配送问题,且与其他启发式算法相比,具有更高的稳定性和求解精度。此外,进行参数灵敏度实验,考虑不同定价策略对外卖配送的影响,使研究结果更具现实意义。This paper took the minimization of the rider’s cost-benefit ratio as the optimization objective and used the minimum ratio traveling salesman problem to model the takeout delivery problem.Aiming at the issues of low accuracy and poor stability of current algorithms for solving this problem,this paper proposed a DRL-MFA algorithm based on deep reinforcement learning.Firstly,the algorithm defined the takeout delivery problem as a Markov decision model to simulate the process between agent and environment.Secondly,the algorithm used a multi-feature aggregation embedding sublayer in the encoder to achieve the advantageous complementarity among the features and improve the modelling ability of nonlinear problems.Finally,the algorithm calculated the probability distribution of the solution by the attention mechanism and pointer network in the decoder and used the strategy gradient to train the network.Through the experimental analysis of classic examples and simu-lation cases in Changchun,the results show that the proposed algorithm can effectively solve the takeout delivery problem,and has higher stability and accuracy than other heuristic algorithms.In addition,this paper conducted the sensitivity experiment to explore the impact of different pricing strategies on takeout delivery,which makes the research more realistic and practical.

关键词：外卖配送问题最小比率旅行商问题深度强化学习多特征嵌入注意力机制

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

求解外卖配送问题的深度强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

求解外卖配送问题的深度强化学习算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索