基于多智能体元强化学习的危险品运输路径优化

Optimization method for hazardous freight transportation routes based on multi-agent meta-reinforcement learning

作　　者：张子贤关伟[1] 奇格奇 ZHANG Zixian;GUAN Wei;QI Geqi(Key Laboratory of Transport Industry of Big Data Application Technologies for Comprehensive Transport,Beijing Jiaotong University,Beijing 100044,China)

机构地区：[1]北京交通大学,综合交通运输大数据应用技术交通运输行业重点实验室,北京100044

出　　处：《交通运输工程与信息学报》2024年第3期93-106,共14页Journal of Transportation Engineering and Information

基　　金：中央高校基本科研业务费专项资金项目(2019JBZ003);国家自然科学基金项目(72288101)。

摘　　要：针对危险品运输车辆路径优化问题,本研究基于运输公司多车辆服务全部客户的现实需求,通过多智能体系统提高车辆之间的协同效率,以不同权重的旅行时间和安全风险最小为运输路径优化目标,同时兼顾时间窗、载货量等约束,构建多智能体强化学习模型,并采用元强化学习方法,建立更具泛化能力的元模型。将不同权重下的危险品运输问题抽象为带时间窗的多车辆多行程运输路径优化子任务,利用深度网络模型的不同嵌入层刻画子任务的高维特征。通过有效结合元学习Reptile算法思想与滚动基线方法训练元模型,前者增强了优化方法对不同子任务的适应性,后者则通过贪婪地选择具有最大概率的动作提高了优化方法在各子任务求解计算中的灵活性。实验结果表明:本文采用的多智能体元强化学习方法相对于迁移强化学习方法,在非支配点数量和超体积两个指标上,分别提升了12%和22%,说明其更接近帕累托最优解;而在不同解码方法中,集束采样方法更具优势。At the route optimization problem of hazardous material transportation vehicles,a transportation company using multiple vehicles to serve all customers is described using a multi-agent system to enhance the collaborative efficiency among vehicles.The optimization objectives are to minimize the travel time and transportation risk while considering the time window and load capacity.Through construction of the multi-agent reinforcement learning model and the application of the meta-RL method,a meta-model with greater generalization ability was established.The hazardous freight transportation problem with different weighting schemes is abstracted as subtasks to optimize multivehicle and multitrip routes with time windows.Different embedding layers of deep neural network models are leveraged to capture the high-dimensional features of the subtasks.By effectively combining the meta-learning reptile algorithm with the rolling baseline approach,our method enhances adaptability to different subtasks and improves flexibility in solving computations by greedily selecting actions with the highest probabilities.The experimental results demonstrate that the proposed multi-agent meta-reinforcement learning method outperforms transfer reinforcement learning methods,achieving a 12%improvement in the non-dominated point count and a 22%improvement in the hypervolume.Thus,the proposed method is closer to the Pareto-optimal solution.Furthermore,among the different decoding methods,beam search sampling exhibits superior performance.

关键词：城市交通危险品运输元强化学习多智能体路径优化

分类号：U116.2[交通运输工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多智能体元强化学习的危险品运输路径优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多智能体元强化学习的危险品运输路径优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索