基于逆强化学习的航天器交会对接方法

A spacecraft rendezvous and docking method based on inverse reinforcement learning

作　　者：岳承磊汪雪川岳晓奎[1,2] 宋婷 YUE Chenglei;WANG Xuechuan;YUE Xiaokui;SONG Ting(National Key Laboratory of Aerospace Flight Dynamics,Northwestern Polytechnical University,Xi'an710072,China;School of Astronautics,Northwestern Polytechnical University,Xi'an 710072,China;Shanghai Aerospace Control Technology Institute,Shanghai 201109,China;Shanghai Key Laboratory of Space Intelligent Control Technology,Shanghai 201109,China)

机构地区：[1]西北工业大学航天飞行动力学技术国家级重点实验室,西安710072 [2]西北工业大学航天学院,西安710072 [3]上海航天控制技术研究所,上海201109 [4]上海市空间智能控制技术重点实验室,上海201109

出　　处：《航空学报》2023年第19期252-263,共12页Acta Aeronautica et Astronautica Sinica

基　　金：国家自然科学基金(U2013206,11972026)。

摘　　要：针对使用神经网络解决追踪航天器接近静止目标问题,提出一种使用模型预测控制提供数据集,基于生成对抗逆强化学习训练神经网络的方法。首先在考虑追踪航天器最大速度约束,控制输入饱和约束和空间锥约束下,建立追踪航天器接近静止目标的动力学,并通过模型预测控制驱动航天器到达指定位置。其次为标称轨迹添加扰动,通过前述方法计算从各起始位置到目标点的轨迹,收集各轨迹各控制时刻的状态与控制信息,形成包含状态与对应控制的训练集。最后通过设置网络结构与参数和训练超参数,在训练集驱动下,采用生成对抗逆强化学习方法进行网络训练。仿真结果表明生成对抗逆强化学习可模仿专家轨迹行为,并成功训练神经网络,驱动航天器从起始点向目标位置运动。For spacecraft proximity maneuvering and rendezvous,a method for training neural networks based on generative adversarial inverse reinforcement learning is proposed by using model predictive control to provide the expert dataset.Firstly,considering the maximum velocity constraint,the control input saturation constraint and the space cone constraint,the dynamics of the chaser spacecraft approaching a static target is established.Then,the chaser spacecraft is driven to reach the target using model predictive control.Secondly,disturbances are added to the nominal trajectory,and the trajectories from each starting positions to the target are calculated using the aforementioned method.The state and command of trajectories at each time are collected to form a training set.Finally,the network structure and parameters are set,and hyperparameters are trained.Driven by the training set,the adversarial inverse reinforcement learning method is used to train the network.The simulation results show that adversarial inverse reinforcement learning can imitate the behavior of expert trajectories,and successfully train the neural network to drive the spacecraft to move from the starting point to the static target.

关键词：模型预测控制生成对抗逆强化学习模仿学习网络训练神经网络

分类号：V448.234[航空宇航科学与技术—飞行器设计]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于逆强化学习的航天器交会对接方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于逆强化学习的航天器交会对接方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索