城轨列车深度强化学习节能优化控制方法

Energy-saving Optimization Control Method for Reinforced Learning of Urban Rail Train GUO Xiao1,MENG Jianjun1,2,3,CHEN Xiaoqiang1,2,3,4

作　　者：郭啸孟建军[1,2,3] 陈晓强胥如迅李德仓[1,2,3] 宋明瑞 GUO Xiao;MENG Jianjun;CHEN Xiaoqiang;XU Ruxun;LI Decang;SONG Mingrui(Mechatronics T&R Institute,Lanzhou Jiaotong University,Lanzhou 730070,China;Gansu Logistics and Transportation Equipment Industry Technical Center,Lanzhou 730070,China;Gansu Logistics and Transportation Equipment Information Technology Research Center,Lanzhou 730070,China;School of Mechatronics Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)

机构地区：[1]兰州交通大学机电技术研究所,兰州730070 [2]甘肃省物流及运输装备信息化工程技术研究中心,兰州730070 [3]甘肃省物流与运输装备行业技术中心,兰州730070 [4]兰州交通大学机电工程学院,兰州730070

出　　处：《铁道标准设计》2024年第7期185-191,217,共8页Railway Standard Design

基　　金：国家自然科学基金项目(72061021、62063013);甘肃省优秀研究生“创新之星”项目(2022CXZX-517)。

摘　　要：为提高城轨列车自动驾驶(Automatic Train Operation,ATO)的控制性能,针对城轨ATO目标速度曲线追踪控制方法中工况切换频繁、牵引能耗高等问题,以列车准点、精准停车和能耗为优化目标,设计了一种以时间冗余(Time Redundancy,TR)规划参考系统为主动约束的列车深度强化学习DQN控制方法。建立了城轨列车动力学模型和多目标优化适应度函数;定义了TR规划参考系统约束下的DQN列车控制器,并对控制器中的动作空间和奖励函数进行设置;最后规定了列车控制器神经网络更新方法,利用随机梯度下降算法更新Q网络参数。结果表明:加入以TR时间规划参考系统为约束的TR-DQN算法提高了DQN迭代收敛速度以及迭代训练过程中的稳定性;TR-DQN动态调整列车运行策略的控制方法相比传统目标速度曲线追踪PID算法能耗降低12.32%,列车站间工况切换频率较低;针对设置的3种不同站间规划时间,列车牵引能耗依次降低7.5%和6.4%,列车站间工况动态切换频率和牵引能耗随行程规划时间增大而降低。In order to improve the control performance of automatic train operation(ATO)of urban rail trains,the problem of frequent condition switching and high traction energy consumption in ATO target speed profile tracking control method for urban rail is addressed.The DQN control method uses train time redundancy(TR)as the active constraint for optimizing train punctuality,stopping precision,and energy consumption.A model of rail train dynamics and an optimal fitness function are developed for urban rail trains.The TR planning reference system constraint is applied to the DQN train controller,and the action space and reward function are defined.Finally,a stochastic gradient descent algorithm is used to update TR-DQN algorithm constrained by the TR time-planning reference system improves the convergence speed of DQN iterations and the stability of the iterative training process.The TR-DQN control method for dynamically adjusting the train operation strategy consumes 12.32%less energy than that of the traditional target speed profile tracking PID algorithm,and the frequency of switching between train stations is lower.The energy consumption of train traction is reduced by 7.5%and 6.4%in turn for the three different planning times set between stations.The frequency of dynamic switching of train conditions between stations and traction energy consumption decrease with the increase of trip planning time.

关键词：城市轨道交通列车自动驾驶时间规划系统节能运行深度强化学习 DQN算法

分类号：U231[交通运输工程—道路与铁道工程] U283.1

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

城轨列车深度强化学习节能优化控制方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

城轨列车深度强化学习节能优化控制方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索