航天器轨道追逃博弈多阶段强化学习训练方法被引量：6

Multi-stage Reinforcement Learning Method for Orbital Pursuit-Evasion Game of Spacecrafts

作　　者：袁利[1,2] 耿远卓汤亮黄煌[1,2] YUAN Li;GENG Yuanzhuo;TANG Liang;HUANG Huang(Beijing Institute of Control Engineering,Beijing 100094,China;Science and Technology on Space Intelligent Control Laboratory,Beijing 100094,China)

机构地区：[1]北京控制工程研究所,北京100094 [2]空间智能控制技术重点实验室,北京100094

出　　处：《上海航天（中英文）》2022年第4期33-41,共9页Aerospace Shanghai（Chinese&English）

摘　　要：针对航天器轨道追逃博弈问题,提出一种多阶段学习训练赋能方法,使得追踪星在终端时刻抵近逃逸星的特定区域,而逃逸星需要通过轨道机动规避追踪星。首先,构建两星的训练策略集,基于逻辑规则设计追踪星和逃逸星的机动策略,通过实时预测对方的终端位置,设计己方的期望位置和脉冲策略,显式给出追逃策略的解析表达式,用于训练赋能;其次,为提升航天器的训练赋能效率及应对未知环境的博弈能力,提出一种基于强化学习技术多模式、分阶段的学习训练方法,先使追踪星和逃逸星分别应对上述逻辑规则引导下的逃逸星和追踪星,完成预训练;再次,开展二次训练,两星都采用邻近策略优化(PPO)策略进行追逃博弈,在博弈中不断调整网络权值,提升决策能力;最后,在仿真环境中验证提出的训练方法的有效性,经过二次训练后,追踪星和逃逸星可有效应对不同策略驱动下的对手,提升追逃成功率。An enabled training method based on multi-phase reinforcement learning is proposed to solve the problem of orbital pursuit-evasion of two spacecrafts,so that the pursuer reaches a specific region adjacent to the evader at the terminal moment while the evader attempts to avoid being chased by means of orbital maneuvering. First,a training set of the pursuer and chaser is constructed. The two rules-based pursuing and evasion policies are proposed for the pursuer and evader,respectively,in which the expected position and pulse policy are analytically designed based on the prediction of the terminal position of the other spacecraft. Second,a multi-mode training method based on reinforcement learning is proposed to enhance the training efficiency and the ability to confront with uncertain adversaries. Third,the spacecraft is pre-trained by confronting with the other spacecraft endowed with the rules-based policies. Based on the pre-trained network,the network is re-trained in which both the spacecrafts are driven by the proximal policy optimization(PPO)scheme where the network weights are updated gradually. Finally,simulations are conducted to evaluate the effectiveness of the proposed training approach. The results show that the spacecraft with retrained network could enhance the success rates of pursuit and escape.

关键词：轨道追逃博弈决策强化学习训练赋能多阶段学习

分类号：TN911.73[电子电信—通信与信息系统] TP391.9[电子电信—信息与通信工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

航天器轨道追逃博弈多阶段强化学习训练方法被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

航天器轨道追逃博弈多阶段强化学习训练方法 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

航天器轨道追逃博弈多阶段强化学习训练方法被引量：6