基于双延迟深度确定性策略梯度的受电弓主动控制  被引量:1

Active Pantograph Control of Deep Reinforcement Learning Based on Double Delay Depth Deterministic Strategy Gradient

在线阅读下载全文

作  者:吴延波 韩志伟[1] 王惠 刘志刚[1] 张雨婧 Wu Yanbo;Han Zhiwei;Wang Hui;Liu Zhigang;zhang Yujing(School of Electric Engineering Southwest Jiaotong University Chengdu 611756,China)

机构地区:[1]西南交通大学电气工程学院,成都611756

出  处:《电工技术学报》2024年第14期4547-4556,共10页Transactions of China Electrotechnical Society

基  金:国家自然科学基金资助项目(U1734202,51977182)。

摘  要:弓网系统耦合性能对于高速列车受流质量起着至关重要的作用,提高弓网耦合性能,一种有效的方法是针对受电弓进行主动控制调节,特别是在低速线路提速及列车多线路混跑时,主动控制可通过提高弓网自适应适配性,有效降低线路改造成本并提升受流质量。针对受电弓主动控制问题,该文提出一种基于双延迟深度确定性策略梯度(TD3)的深度强化学习受电弓主动控制算法。通过建立弓网耦合模型实现深度强化学习系统环境模块,利用TD3作为受电弓行为控制策略,最终通过对控制器模型训练实现有效的受电弓控制策略。实验结果表明,运用该文方法可有效提升低速线路列车高速运行时弓网耦合性能及受电弓在多线路运行时的适应性,为铁路线路提速及列车跨线路运行提供新的思路。The stable coupling between the pantograph and the catenary is the foundation for the safe operation of high-speed railway trains.With speed increases,the offline and arcing of the pantograph and catenary can affect the performance,leading to a decrease in the current collection quality of the train.At present,the primary method to improve the current collection quality is the active control method of the pantograph.The self-adaptability of current control algorithms mainly solves adaptive selection problems of algorithm parameters.However,few studies on the impact of changes in line conditions and external disturbances exist.This paper constructs the pantograph active control system based on the deep reinforcement learning method,which can effectively overcome the complex time-varying characteristics of the pantograph catenary system to reduce fluctuations of the pantograph catenary contact force.The deep reinforcement learning algorithm is introduced.Then,a pantograph catenary coupling model is constructed as the environmental module to generate data for deep reinforcement learning training and obtain feedback on control strategies.The pantograph adopts a three-mass block model,and the contact network adopts a nonlinear pole/cable finite element method coupled through penalty functions.The pantograph active control system’s objectives and the existing constraints are analyzed according to state space,observation space,action space,and reward function required in the deep reinforcement learning framework.The process of controller training and testing is provided.The effectiveness and robustness of the pantograph active control system are verified.The experimental results show that the reinforcement learning active control reduces contact force fluctuations at different speeds,and the average value of the contact force is almost unchanged.Compared with the finite frequency H∞control,the standard deviation of the contact force is decreased by 21.8%using the double delay deep deterministic strategy gradient(TD3

关 键 词:低速线路 混跑 双延迟深度确定性策略梯度(TD3) 受电弓主动控制 

分 类 号:TM571[电气工程—电器]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象