检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:袁利[1,2] 耿远卓 汤亮 黄煌[1,2] YUAN Li;GENG Yuanzhuo;TANG Liang;HUANG Huang(Beijing Institute of Control Engineering,Beijing 100094,China;Science and Technology on Space Intelligent Control Laboratory,Beijing 100094,China)
机构地区:[1]北京控制工程研究所,北京100094 [2]空间智能控制技术重点实验室,北京100094
出 处:《上海航天(中英文)》2022年第4期33-41,共9页Aerospace Shanghai(Chinese&English)
摘 要:针对航天器轨道追逃博弈问题,提出一种多阶段学习训练赋能方法,使得追踪星在终端时刻抵近逃逸星的特定区域,而逃逸星需要通过轨道机动规避追踪星。首先,构建两星的训练策略集,基于逻辑规则设计追踪星和逃逸星的机动策略,通过实时预测对方的终端位置,设计己方的期望位置和脉冲策略,显式给出追逃策略的解析表达式,用于训练赋能;其次,为提升航天器的训练赋能效率及应对未知环境的博弈能力,提出一种基于强化学习技术多模式、分阶段的学习训练方法,先使追踪星和逃逸星分别应对上述逻辑规则引导下的逃逸星和追踪星,完成预训练;再次,开展二次训练,两星都采用邻近策略优化(PPO)策略进行追逃博弈,在博弈中不断调整网络权值,提升决策能力;最后,在仿真环境中验证提出的训练方法的有效性,经过二次训练后,追踪星和逃逸星可有效应对不同策略驱动下的对手,提升追逃成功率。An enabled training method based on multi-phase reinforcement learning is proposed to solve the problem of orbital pursuit-evasion of two spacecrafts,so that the pursuer reaches a specific region adjacent to the evader at the terminal moment while the evader attempts to avoid being chased by means of orbital maneuvering. First,a training set of the pursuer and chaser is constructed. The two rules-based pursuing and evasion policies are proposed for the pursuer and evader,respectively,in which the expected position and pulse policy are analytically designed based on the prediction of the terminal position of the other spacecraft. Second,a multi-mode training method based on reinforcement learning is proposed to enhance the training efficiency and the ability to confront with uncertain adversaries. Third,the spacecraft is pre-trained by confronting with the other spacecraft endowed with the rules-based policies. Based on the pre-trained network,the network is re-trained in which both the spacecrafts are driven by the proximal policy optimization(PPO)scheme where the network weights are updated gradually. Finally,simulations are conducted to evaluate the effectiveness of the proposed training approach. The results show that the spacecraft with retrained network could enhance the success rates of pursuit and escape.
关 键 词:轨道追逃 博弈决策 强化学习 训练赋能 多阶段学习
分 类 号:TN911.73[电子电信—通信与信息系统] TP391.9[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.90