基于遗传算法优化的深度强化学习-PI空气舵伺服系统控制策略被引量：5

Deep reinforcement learning-PI control strategy of air servo system based on genetic algorithm optimization

作　　者：洪子祺许文波[2] 吕晨欧阳权王志胜[1] HONG Zi-qi;XU Wen-bo;LV Chen;OUYANG Quan;WANG Zhi-sheng(School of Automation Engineering,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China;Laboratory of Aerospace Servo Actuation and Transmission,Beijing Institute of Precision Mechatronics and Controls,Beijing 100076,China)

机构地区：[1]南京航空航天大学自动化学院,江苏南京210016 [2]北京精密机电控制设备研究所航天伺服驱动与传动技术研究室,北京100076

出　　处：《机电工程》2023年第7期1071-1078,共8页Journal of Mechanical & Electrical Engineering

基　　金：航天伺服驱动与传动技术实验室开放基金资助项目(LASAT-20210502)。

摘　　要：针对传统比例积分控制难以选定控制性能更好参数的问题,以空气舵伺服系统为研究对象,提出了一种基于遗传算法优化的强化学习-PI的控制方法。首先,建立了空气舵伺服系统的数学模型;然后,采用遗传算法优化了PI控制器的初始参数;采用深度确定性策略梯度算法对当前PI控制器进行了实时整定,从而实现了对空气舵伺服系统进行位置指令控制的功能;最后,在Simulink中通过仿真分析,对所采用的方法应用于空气舵伺服系统的效果进行了验证。研究结果表明:改进的算法在参数摄动时,具备一定的在线稳定性;在空载情况下,所需要的调节时间要小于遗传算法-PI、DDPG-PI与传统PI算法,至少缩短了20%;同时,在负载情况下,相比其他3种方法,改进算法的波动幅值与负载结束后回到稳态时间至少缩短了15%,证明了所使用方法在空气舵伺服系统里的有效性。Aiming at the problem that traditional proportional integral(PI)control was difficult to select parameters with better control performance,taking the air rudder servo system as the research object,a control method of reinforcement learning-PI based on genetic algorithm optimization was proposed.Firstly,the mathematical model of the air rudder servo system was established.Then,the initial parameters of PI controller were optimized by genetic algorithm.The current PI controller was adjusted in real time using the deep deterministic policy gradient(DDPG)algorithm to realize the position command control of the air rudder servo system.Finally,the effect of the method used in the air rudder servo system was verified in Simulink through simulation analysis.The results show that the improved algorithm has certain online stability when the parameters are perturbed.In the case of no load,the required adjustment time is less than that of genetic algorithm-PI,DDPG-PI and traditional PI algorithm,and it is increased by at least 20%.At the same time,in the case of load,the fluctuation amplitude of the improved algorithm is at least 15%better than that of the other three methods compared with the time to return to steady state after the end of load,which proves the effectiveness of the method used in the air rudder servo system.

关键词：伺服系统比例积分(PI)控制器遗传算法深度确定性策略梯度算法参数优化 SIMULINK

分类号：TH-39[机械工程] TJ765[兵器科学与技术—武器系统与运用工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于遗传算法优化的深度强化学习-PI空气舵伺服系统控制策略被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于遗传算法优化的深度强化学习-PI空气舵伺服系统控制策略 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于遗传算法优化的深度强化学习-PI空气舵伺服系统控制策略被引量：5