基于强化学习的新型列控系统区间行车间隔控制方法被引量：4

A new interval control method for train control system based on reinforcement learning

作　　者：付文秀[1] 李亚吕继东[1] 李丹勇李洋 FU Wenxiu;LI Ya;LYU Jidong;LI Danyong;LI Yang(School of Electronic and Information Engineering,Beijing Jiaotong University,Beijing 100044,China;China Railway Jinan Bureau,Jinan 250001,China)

机构地区：[1]北京交通大学电子信息工程学院,北京100044 [2]中国铁路济南局集团有限公司,济南250001

出　　处：《北京交通大学学报》2021年第5期63-73,共11页JOURNAL OF BEIJING JIAOTONG UNIVERSITY

基　　金：中央高校基本科研业务费专项资金(2020JBZD002);北京市自然科学基金(L201004)。

摘　　要：列车间隔控制是保证列车运行安全和提高列车行车密度的关键.基于车-车通信的新型列控系统能够感知更多的列车运行环境信息,缩小列车行车间隔,提升列车运行效率.本文将列车速度控制视为一个决策过程,采用强化学习算法来实现新型列控系统中列车区间速度的实时控制.首先,结合车-车通信获得所处环境的列车状态信息,采用蒙特卡洛树搜索算法实时生成列车动态速度调整序列;然后,通过动态规划算法对序列进行分析处理,并在此基础上,确定列车当前时刻所应采取的速度控制策略;最后,仿真模拟了多车在不同初始条件下的列车间隔控制运行场景.仿真结果表明,在相同场景下强化学习算法对比模糊控制算法在行车间隔控制的响应速度、调节时间、总体波动以及超调量上具有一定的优势.Train interval control is the key to ensure the safety of train operation and increase the train density. The Vehicle to Vehicle(V2 V) communication-based Train Control System(VBTC) can perceive more environment information during train operation, thus reducing train interval and improving train operation efficiency.In this paper, the real-time train sectional speed control in the VBTC is realized based on the Reinforcement Learning algorithm, regarding train speed control as a decision-making process.First, according to the state information of the train in its respective environment obtained from the V2 V communication, the train real-time dynamic speed adjustment sequence is generated using the Monte Carlo Tree Search algorithm. And then, the speed control strategy that is optimum at the moment is determined by analyzing the sequence through the Dynamic Programming algorithm. Finally, the train dynamic spacing operation scenarios with different initial conditions have been carried out. The results show that compared with Fuzzy Control algorithm, the Reinforcement Learning algorithm has some advantages in response speed, adjusting time, overall fluctuation, and overshooting in the VBTC interval control system under the same scenario.

关键词：新型列控系统间隔控制强化学习蒙特卡洛树搜索算法动态规划

分类号：U231.7[交通运输工程—道路与铁道工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的新型列控系统区间行车间隔控制方法被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的新型列控系统区间行车间隔控制方法 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于强化学习的新型列控系统区间行车间隔控制方法被引量：4