时滞影响下压电悬臂梁强化学习振动控制  

Reinforcement learning based vibration control of a piezoelectric cantilever beam with time delay

在线阅读下载全文

作  者:张猛[1] 王晓宇[2] 文浩[1] ZHANG Meng;WANG Xiaoyu;WEN Hao(State Key Laboratory of Mechanics and Control of Mechanical Structures,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China;Beijing Institute of Spacec Systems Engineering,Beijing 100094,China)

机构地区:[1]南京航空航天大学航空航天结构力学及控制全国重点实验室,南京210016 [2]北京空间飞行器总体设计部,北京100094

出  处:《振动与冲击》2024年第16期77-83,共7页Journal of Vibration and Shock

基  金:国家重点研发计划项目(2020YFA0711700)。

摘  要:时滞普遍存在于各种控制系统中,如果忽略控制系统中时滞的影响可能会降低控制器的控制效果,甚至导致发散。因此研究了时滞对强化学习(reinforcement learning,RL)振动控制器性能的影响。首先,利用有限元方法建立了压电悬臂梁的动力学模型,通过试验辨识修正了动力学模型参数;进而,仿真分析了不同时滞大小对比例微分控制和基于近端优化策略的RL控制效果的影响;然后,在不同时滞条件下训练了多个RL时滞控制器,并对RL控制效果进行了仿真及试验验证;最后,评估了RL时滞控制器对时滞偏差的鲁棒性。结果显示,RL时滞控制器不仅在所对应的时滞条件下具有良好的控制效果,还对实际时滞偏差有一定容忍范围,具有良好鲁棒性。The presence of time delays in various control systems can have a significant impact on the performance of controllers.Ignoring time delays may result in reduced control effectiveness and even instability.This study investigates the effects of time delays on reinforcement learning based vibration controller.Firstly,a dynamic model of a piezoelectric cantilever beam is established using the finite element method,and the parameters of the dynamic model are corrected using experimental identification methods.Subsequently,the impact of different time delay conditions on the Proximal Policy Optimization(PPO)-based reinforcement learning(RL)controller and the PD controller are simulated and analyzed.Then,multiple reinforcement learning time-delay controllers are trained under different time-delay conditions,and the control effect of the time-delay controller is simulated and experimentally verified.Finally,the robustness of the reinforcement learning time-delay controller to time delay deviations is evaluated.The results show that the reinforcement learning time-delay controller not only has good control performance under the corresponding time delay conditions but also has a certain tolerance range for actual time delay deviations,demonstrating good robustness.

关 键 词:强化学习(RL) 近端优化策略 时滞 振动控制 

分 类 号:O328[理学—一般力学与力学基础]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象