喷气驱动航天器姿态控制强化学习算法及实验被引量：1

Reinforcement Learning-based Attitude Control for Spacecraft with Reaction Jets:Theory and Experiment

作　　者：杜德嵩刘延芳[1] 袁秋帆赵福友齐乃明[1] DU Desong;LIU Yanfang;YUAN Qiufan;ZHAO Fuyou;QI Naiming(School of Astronautics,Harbin Institute of Technology,Harbin 150001,China;Shanghai Aerospace System Engineering Institute,Shanghai 201109,China)

机构地区：[1]哈尔滨工业大学航天学院,哈尔滨150001 [2]上海宇航系统工程研究所,上海201109

出　　处：《宇航学报》2024年第6期903-913,共11页Journal of Astronautics

基　　金：国家重点研发计划(2022YFB3902701);国家自然科学基金(52272390);黑龙江省自然科学基金优秀青年项目(YQ2022A009)。

摘　　要：针对喷气驱动航天器在推力幅值受限条件下的姿态控制问题,提出一种姿态控制强化学习算法。该算法包含两个神经网络,即控制策略网络和李雅普诺夫神经网络。其中,控制策略网络直接以喷气推力器的推力作为输出,训练数据中推力满足幅值约束条件,隐式地解决推力分配优化和控制量饱和问题;设计姿态控制强化学习算法,并引入基于样本数据的航天器姿态稳定性定理,保证学习得到的控制策略的稳定性。仿真结果表明,与主流的强化学习算法和传统姿态控制方法相比,所提出的姿态控制算法在敏捷性方面表现出显著优势。此外,将控制策略移植到半物理仿真平台,控制策略能够有效完成航天器的大角度机动任务,从而证明了通过所提出的姿态控制算法训练得到的控制策略具有良好的泛化性和鲁棒性。Addressing the challenge of controlling the attitude of jet-propelled spacecraft under constrained thrust amplitude conditions,a novel attitude control algorithm based on a reinforcement learning framework is introduced.The algorithm comprises two neural networks:a control policy and a Lyapunov function.The control policy directly outputs the thrust forces,adhering to thrust amplitude constraints to resolve the thrust allocation and control saturation issues.By introducing a sample-based spacecraft attitude stability theorem,the attitude control algorithm is appropriately designed to ensure that the control policy meets stability constraints,and the proof of stability is provided.Simulation results show that the proposed attitude control algorithm significantly outperforms the mainstream reinforcement learning algorithm and traditional attitude control methods.Directly applied to a semi-physical simulation platform,the control policy effectively accomplishes the large-angle maneuvering task,demonstrating commendable generalization capabilities and robustness.These results substantiate the effectiveness of the proposed reinforcement learning-based attitude control algorithm.

关键词：强化学习姿态控制李雅普诺夫函数半物理仿真

分类号：V448.22[航空宇航科学与技术—飞行器设计]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

喷气驱动航天器姿态控制强化学习算法及实验被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

喷气驱动航天器姿态控制强化学习算法及实验 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

喷气驱动航天器姿态控制强化学习算法及实验被引量：1