检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:伦嘉铭 姜海明 谢康 LUN Jia-ming;JIANG Hai-ming;XIE Kang(School of Electromechanical Engineering,Guangdong University of Technology,Guangzhou Guangdong 510006,China)
机构地区:[1]广东工业大学机电工程学院,广东广州510006
出 处:《计算机仿真》2024年第4期129-135,425,共8页Computer Simulation
基 金:国家自然科学基金项目(11874126)、广东省“领军人才”项目(400180001)。
摘 要:为缓解公交站场的服务中断问题,提出一种基于强化学习的动态发车控制策略。策略利用长短期记忆(LSTM)模型对公交行程时间进行预测,使智能体感知站场车辆与运行车辆的车头时距状态,以更好地评估决策的长期影响。针对站场无车可发的场景,在计算动作概率分布时应用状态相关可微函数将无效动作遮蔽,避免智能体下发无效指令。通过奖励函数对大发车间隔进行惩罚,并使用近端策略优化(PPO)对模型进行训练。仿真结果表明,与传统方法相比,所提方法不仅能有效避免公交站场服务中断,而且使车辆载客率更均衡,乘客等待时间更少,车辆利用效率更高。In order to alleviate the problem of bus service disruption in depot,this paper proposes a dynamic departure control strategy based on reinforcement learning.This strategy uses a long short-term memory(LSTM)model to predict bus travel time,so that the agent can perceive the headway status of the depot vehicle and the running vehicle to better evaluate the long-term impact of the decision made by the agent.For the scenario where there is no bus stop at the depot,the state-dependent differentiable function is used to mask invalid actions when calculating the action probability distribution,so as to avoid invalid commands from the agent.The model is trained using proximal policy optimization(PPO)and penalizes large departure intervals through a reward function.The experimental result shows that,compared with the traditional method,the method proposed in this paper can not only effectively avoid the bus service disruption in the depot,but also make the bus passenger load ratio more balanced,the passenger waiting time shorter,and the vehicle utilization efficiency higher.
关 键 词:公交服务中断 实时控制 强化学习 近端策略优化 无效动作遮蔽
分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49