检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:耿远卓 袁利 黄煌[1,2] 汤亮 GENG Yuan-Zhuo;YUAN Li;HUANG Huang;TANG Liang(Beijing Institute of Control Engineering,Beijing 100094;Science and Technology on Space Intelligent Control Laboratory,Beijing 100094;China Academy of Space Technology,Beijing 100094)
机构地区:[1]北京控制工程研究所,北京100094 [2]空间智能控制技术重点实验室,北京100094 [3]中国空间技术研究院,北京100094
出 处:《自动化学报》2023年第5期974-984,共11页Acta Automatica Sinica
基 金:国家自然科学基金(U21B6001);中国博士后科学基金(2022M722994)资助。
摘 要:针对脉冲推力航天器轨道追逃博弈问题,提出一种基于强化学习的决策方法,实现追踪星在指定时刻抵近至逃逸星的特定区域,其中两星都具备自主博弈能力.首先,充分考虑追踪星和逃逸星的燃料约束、推力约束、决策周期约束、运动范围约束等实际约束条件,建立锥形安全接近区及追逃博弈过程的数学模型;其次,为了提升航天器面对不确定博弈对抗场景的自主决策能力,以近端策略优化(Proximal policy optimization,PPO)算法框架为基础,采用左右互搏的方式同时训练追踪星和逃逸星,交替提升两星的决策能力;在此基础上,为了在指定时刻完成追逃任务,提出一种终端诱导的奖励函数设计方法,基于CW(Clohessy Wiltshire)方程预测两星在终端时刻的相对误差,并将该预测误差引入奖励函数中,有效引导追踪星在指定时刻进入逃逸星的安全接近区.与现有基于当前误差设计奖励函数的方法相比,所提方法能够有效提高追击成功率.最后,通过与其他学习方法仿真对比,验证提出的训练方法和奖励函数设计方法的有效性和优越性.This paper addresses the problem of orbital pursuit-evasion of two multi-impulse satellites with strong maneuver ability.A reinforcement-learning based approach is proposed to train two satellites such that the pursuer can reach to a specific region adjacent to the evader at the appointed time.First,by taking fuel limits,control force limits,control frequency,and range of motion into consideration,the model for conical approach region and orbital dynamics of relative motion between two satellites is established.Based on this model,to enhance the ability of confronting with the situations with high uncertainties,the proximal policy optimization(PPO)scheme is adopted to train the pursuer and the evader alternately.Moreover,to accomplish the pursuit or evasion at the appointed time,a new kind of reward function is designed based on the final predicted error,which guides the pursuer to approach the evader approximately at the prescribed time.Compared with existing reward function design methods based on the current error,the proposed method in this paper can effectively enhance the success rate of pursuit.Finally,the simulation comparisons are conducted to show the superiority of the terminal-guidance reward function proposed in this paper over traditional reward function design approaches.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7