基于终端诱导强化学习的航天器轨道追逃博弈  被引量:15

Terminal-guidance Based Reinforcement-learning for Orbital Pursuit-evasion Game of the Spacecraft

在线阅读下载全文

作  者:耿远卓 袁利 黄煌[1,2] 汤亮 GENG Yuan-Zhuo;YUAN Li;HUANG Huang;TANG Liang(Beijing Institute of Control Engineering,Beijing 100094;Science and Technology on Space Intelligent Control Laboratory,Beijing 100094;China Academy of Space Technology,Beijing 100094)

机构地区:[1]北京控制工程研究所,北京100094 [2]空间智能控制技术重点实验室,北京100094 [3]中国空间技术研究院,北京100094

出  处:《自动化学报》2023年第5期974-984,共11页Acta Automatica Sinica

基  金:国家自然科学基金(U21B6001);中国博士后科学基金(2022M722994)资助。

摘  要:针对脉冲推力航天器轨道追逃博弈问题,提出一种基于强化学习的决策方法,实现追踪星在指定时刻抵近至逃逸星的特定区域,其中两星都具备自主博弈能力.首先,充分考虑追踪星和逃逸星的燃料约束、推力约束、决策周期约束、运动范围约束等实际约束条件,建立锥形安全接近区及追逃博弈过程的数学模型;其次,为了提升航天器面对不确定博弈对抗场景的自主决策能力,以近端策略优化(Proximal policy optimization,PPO)算法框架为基础,采用左右互搏的方式同时训练追踪星和逃逸星,交替提升两星的决策能力;在此基础上,为了在指定时刻完成追逃任务,提出一种终端诱导的奖励函数设计方法,基于CW(Clohessy Wiltshire)方程预测两星在终端时刻的相对误差,并将该预测误差引入奖励函数中,有效引导追踪星在指定时刻进入逃逸星的安全接近区.与现有基于当前误差设计奖励函数的方法相比,所提方法能够有效提高追击成功率.最后,通过与其他学习方法仿真对比,验证提出的训练方法和奖励函数设计方法的有效性和优越性.This paper addresses the problem of orbital pursuit-evasion of two multi-impulse satellites with strong maneuver ability.A reinforcement-learning based approach is proposed to train two satellites such that the pursuer can reach to a specific region adjacent to the evader at the appointed time.First,by taking fuel limits,control force limits,control frequency,and range of motion into consideration,the model for conical approach region and orbital dynamics of relative motion between two satellites is established.Based on this model,to enhance the ability of confronting with the situations with high uncertainties,the proximal policy optimization(PPO)scheme is adopted to train the pursuer and the evader alternately.Moreover,to accomplish the pursuit or evasion at the appointed time,a new kind of reward function is designed based on the final predicted error,which guides the pursuer to approach the evader approximately at the prescribed time.Compared with existing reward function design methods based on the current error,the proposed method in this paper can effectively enhance the success rate of pursuit.Finally,the simulation comparisons are conducted to show the superiority of the terminal-guidance reward function proposed in this paper over traditional reward function design approaches.

关 键 词:航天器追逃 智能博弈 近端策略优化 奖励函数设计 终端诱导 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] V448.2[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象