基于安全强化学习的航天器交会制导方法  被引量:1

Spacecraft Rendezvous Guidance Method Based on Safe Reinforcement Learning

在线阅读下载全文

作  者:幸林泉 肖应民 杨志斌[1,2] 韦正旻 周勇 高赛军[3] XING Linquan;XIAO Yingmin;YANG Zhibin;WEI Zhengmin;ZHOU Yong;GAO Saijun(School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;Key Laboratory of Safety-critical Software,Ministry of Industry and Information Technology,Nanjing 211106,China;Shanghai Aerospace Electronic Technology Institute,Shanghai 201109,China)

机构地区:[1]南京航空航天大学计算机科学与技术学院,南京211106 [2]高安全系统的软件开发与验证技术工信部重点实验室,南京211106 [3]上海航天电子技术研究所,上海201109

出  处:《计算机科学》2023年第8期271-279,共9页Computer Science

基  金:国家自然科学基金(62072233);国防基础科学研究计划(JCKY2020205C006);南京航空航天大学科研与实践创新计划(xcxjh20211604)。

摘  要:随着航天器交会对接任务越来越复杂,对其高效性、自主性和安全性的要求急剧增加。近年来,引入强化学习技术来解决航天器交会制导问题已经成为国际前沿热点。障碍物避撞对于确保航天器安全交会对接至关重要,而一般的强化学习算法没有对探索空间进行安全限制,这使得航天器交会制导策略设计面临挑战。为此,提出了基于安全强化学习的航天器交会制导方法。首先,设计避撞场景下航天器自主交会的马尔可夫模型,提出基于障碍预警与避撞约束的奖励机制,从而建立用于求解航天器交会制导策略的安全强化学习框架;其次,在该安全强化学习框架下,基于近端策略优化算法(PPO)和深度确定性策略梯度算法(DDPG)这两种深度强化学习算法生成了制导策略。实验结果表明,该方法能有效地进行障碍物避撞并以较高的精度完成交会。另外,通过分析两种算法的性能优劣和泛化能力,进一步证明了所提方法的有效性。With the increasing complexity of spacecraft rendezvous and docking tasks,the requirements for its efficiency,autonomy and reliability are highly demanded.In recent years,the introduction of reinforcement learning technology to solve the problem of spacecraft rendezvous and guidance has become an international frontier hotspot.Obstacle avoidance is critical for safe spacecraft rendezvous,and the general reinforcement learning algorithm does not impose safety restrictions on space exploration,which make the design of spacecraft rendezvous guidance policy challenging.This paper proposes a spacecraft rendezvous guidance method based on safe reinforcement learning.First,a Markov model of autonomous spacecraft rendezvous in collision avoidance scenarios is designed,a reward mechanism based on obstacle warning and collision avoidance restraint is proposed,and thus a safe reinforcement learning framework for solving spacecraft rendezvous guidance strategy is established.Second,with the framework of safe reinforcement learning,guidance policies are generated based on two deep reinforcement learning algorithms,proximal po-licy optimization(PPO)and deep deterministic policy gradient(DDPG).Experimental results show that the method can effectively avoid obstacle and complete the rendezvous with high accuracy.In addition,the performance and generalization ability of the two algorithms are analyzed,which proves the effectiveness of the proposed method.

关 键 词:航天器交会制导 障碍物避撞 安全强化学习 近端策略优化 深度确定性策略梯度 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象