检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张子贤 关伟[1] 奇格奇 ZHANG Zixian;GUAN Wei;QI Geqi(Key Laboratory of Transport Industry of Big Data Application Technologies for Comprehensive Transport,Beijing Jiaotong University,Beijing 100044,China)
机构地区:[1]北京交通大学,综合交通运输大数据应用技术交通运输行业重点实验室,北京100044
出 处:《交通运输工程与信息学报》2024年第3期93-106,共14页Journal of Transportation Engineering and Information
基 金:中央高校基本科研业务费专项资金项目(2019JBZ003);国家自然科学基金项目(72288101)。
摘 要:针对危险品运输车辆路径优化问题,本研究基于运输公司多车辆服务全部客户的现实需求,通过多智能体系统提高车辆之间的协同效率,以不同权重的旅行时间和安全风险最小为运输路径优化目标,同时兼顾时间窗、载货量等约束,构建多智能体强化学习模型,并采用元强化学习方法,建立更具泛化能力的元模型。将不同权重下的危险品运输问题抽象为带时间窗的多车辆多行程运输路径优化子任务,利用深度网络模型的不同嵌入层刻画子任务的高维特征。通过有效结合元学习Reptile算法思想与滚动基线方法训练元模型,前者增强了优化方法对不同子任务的适应性,后者则通过贪婪地选择具有最大概率的动作提高了优化方法在各子任务求解计算中的灵活性。实验结果表明:本文采用的多智能体元强化学习方法相对于迁移强化学习方法,在非支配点数量和超体积两个指标上,分别提升了12%和22%,说明其更接近帕累托最优解;而在不同解码方法中,集束采样方法更具优势。At the route optimization problem of hazardous material transportation vehicles,a transportation company using multiple vehicles to serve all customers is described using a multi-agent system to enhance the collaborative efficiency among vehicles.The optimization objectives are to minimize the travel time and transportation risk while considering the time window and load capacity.Through construction of the multi-agent reinforcement learning model and the application of the meta-RL method,a meta-model with greater generalization ability was established.The hazardous freight transportation problem with different weighting schemes is abstracted as subtasks to optimize multivehicle and multitrip routes with time windows.Different embedding layers of deep neural network models are leveraged to capture the high-dimensional features of the subtasks.By effectively combining the meta-learning reptile algorithm with the rolling baseline approach,our method enhances adaptability to different subtasks and improves flexibility in solving computations by greedily selecting actions with the highest probabilities.The experimental results demonstrate that the proposed multi-agent meta-reinforcement learning method outperforms transfer reinforcement learning methods,achieving a 12%improvement in the non-dominated point count and a 22%improvement in the hypervolume.Thus,the proposed method is closer to the Pareto-optimal solution.Furthermore,among the different decoding methods,beam search sampling exhibits superior performance.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49