基于强化学习的高铁列车运行图编制模型优化方法研究  

Optimization Method of Train Working Diagram Compilation Model of High Speed Railways Based on Reinforcement Learning

作  者:范文天 曾勇程 郭一唯 杨宁 张海峰 FAN Wentian;ZENG Yongcheng;GUO Yiwei;YANG Ning;ZHANG Haifeng(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210003,Jiangsu,China;Nanjing Artificial Intelligence Research of IA,Nanjing 211135,Jiangsu,China;School of Information,University of Chinese Academy of Sciences,Nanjing 211135,Jiangsu,China;Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100190,China;China Railway Train Working Diagram Technology Center,Beijing 100081,China;Transportation&Economics Research Institute,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China)

机构地区:[1]南京邮电大学计算机学院,江苏南京210003 [2]中科南京人工智能创新研究院,江苏南京211135 [3]中国科学院大学南京学院信息学院,江苏南京211135 [4]中国科学院自动化研究所,北京100190 [5]中国科学院大学人工智能学院,北京100190 [6]中国铁路列车运行图技术中心,北京100081 [7]中国铁道科学研究院集团有限公司运输及经济研究所,北京100081

出  处:《铁道运输与经济》2025年第1期70-81,共12页Railway Transport and Economy

基  金:中国国家铁路集团有限公司科技研究开发计划课题(P2022X012)。

摘  要:针对高铁列车运行图中可能存在的停站时间超出范围、运行时间超出范围、超车和间隔时间不足这4类冲突,基于强化学习理论,实现一个用于调解列车运行图冲突的智能体。通过建立列车运行图编制环境,研究设计用于调解不同冲突的算子集,利用近端策略优化算法在搭建好的环境中训练智能体。为提升算法性能,采用启发式贪心算法采集样本对网络进行监督学习作为前期预训练,利用熵增加算法的探索力度和多策略决策让最终的调解方案更加有效,并使用模型预热让算法网络在每个测试环境中进行参数微调以适应新环境。结果表明,在相同初始环境下,该方法消解所有冲突所需步骤显著少于启发式贪心算法,且100%消解所有冲突的概率远大于启发式贪心算法,该方法为列车运行图编制模型提供了新的参考。To address four types of conflicts that may exist in the train working diagram of high speed railways,such as stop timeout,long-time running,overtaking,and insufficient interval time,this paper implemented an agent to resolve train working diagram conflicts based on reinforcement learning theory.By establishing a train working diagram compilation environment,the research designed an operator set for resolving different conflicts and trained the agent in the constructed environment by using the proximal policy optimization(PPO)algorithm.To enhance algorithm performance,a heuristic greedy algorithm was used to collect samples for supervised learning of networks as initial pre-training.The entropy-increase algorithm was employed to intensify exploration,and multi-policy decision making was utilized to make the final resolution more effective.Model pre-warming was performed to fine-tune the algorithm network parameters in each test environment to adapt to new conditions.The results show that under the same initial conditions,the number of steps required by the proposed method to resolve all conflicts is significantly less than that of steps required by the heuristic greedy algorithm,and the probability of completely resolving all conflicts by the proposed method is much greater than that by the heuristic greedy algorithm.This method provides a new reference for the train working diagram compilation model.

关 键 词:列车运行图 强化学习 PPO算法 冲突调解 启发式贪心算法 

分 类 号:U292[交通运输工程—交通运输规划与管理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象