跨域拦截弹数据驱动的在线强化学习姿态控制方法

Data-driven Online Reinforcement Learning Attitude Control Method for Cross-domain Interceptors

作　　者：赵琛钰胥彪宋勋赵启伦李爽 ZHAO Chenyu;XU Biao;SONG Xun;ZHAO Qilun;LI Shuang(College of Astronautics,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,Jiangsu,China;Key Laboratory of Space Photoelectric Detection and Perception,Nanjing 211106,Jiangsu,China;Shanghai AVICAS Avionics Systems Co.Ltd.,Shanghai 201100,China;Beijing Institute of Electronic System Engineering,Beijing 100854,China)

机构地区：[1]南京航空航天大学航天学院,江苏南京211106 [2]南京航空航天大学空间光电探测与感知工业和信息化部重点实验室,江苏南京211106 [3]上海民用航空电子系统有限公司,上海201100 [4]北京电子工程总体研究所,北京100854

出　　处：《上海航天(中英文)》2024年第6期39-45,共7页Aerospace Shanghai(Chinese&English)

基　　金：空间光电探测与感知工业和信息化部重点实验室基金资助(NJ2022025-05)。

摘　　要：针对跨域拦截弹在宽速域、大空域飞行面临动力学建模较难、模型未知的问题,提出一种基于数据驱动的在线强化学习姿态控制方法。受零和博弈的启发,将干扰也当作是系统输入的一部分设计性能指标函数。实际拦截弹控制量输入的目的是最小化性能指标函数,提高系统性能,而干扰的作用则相反。然后,通过构建评价网络在线学习获得相应的近似解,并通过更新权值来动态地处理不确定性。与传统依赖模型的在线强化学习求解方法不同,数据驱动的强化学习方法不再需要拦截弹系统的动态模型信息,而是仅通过系统的输入输出数据来驱动网络进行权值在线学习更新。最终,通过仿真验证了该方法的有效性。In order to solve the problem that it is difficult to conduct dynamic modeling for cross-domain interceptors flying in wide-speed and large-space domains and there are no relevant models,a data-driven online reinforcement learning attitude control method is proposed.First,inspired by the zero-sum game,the interference is considered as a part of the system input to design the performance index function.The purpose of the actual interceptor control quantity input is to minimize the performance index function and improve the system performance,while the effect of interference is opposite.Then,the corresponding approximate solution is obtained through online learning by constructing a critic network,and the uncertainty is handled by updating the weights dynamically.Different from the traditional model-based online reinforcement learning solution method,the data-driven reinforcement learning(RL data driven)method no longer requires the dynamic model information of the interceptor system,but only uses the input and output data of the system to drive the network online learning and updating of weights.Finally,the effectiveness of the proposed method is verified by simulation.

关键词：跨域拦截弹在线强化学习数据驱动零和博弈不确定性

分类号：V448[航空宇航科学与技术—飞行器设计]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

跨域拦截弹数据驱动的在线强化学习姿态控制方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

跨域拦截弹数据驱动的在线强化学习姿态控制方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索