一种超参数自适应航天器交会变轨策略优化方法被引量：1

An Adaptive Hyperparameter Strategy Optimization Method for Spacecraft Rendezvous and Orbital Transfer

作　　者：孙雷翔郭延宁[2] 邓武东[3] 吕跃勇[2] 马广富[2] SUN Leixiang;GUO Yanning;DENG Wudong;LYU Yueyong;MA Guangfu(Institute of Space Science and Applied Technology,Harbin Institute of Technology(Shenzhen),Shenzhen 518055,China;School of Astronautics,Harbin Institute of Technology,Harbin 150001,China;Shanghai Institute of Satellite Engineering,Shanghai 201109,China)

机构地区：[1]哈尔滨工业大学(深圳)空间科学与应用技术研究院,深圳518055 [2]哈尔滨工业大学航天学院,哈尔滨150001 [3]上海卫星工程研究所,上海201109

出　　处：《宇航学报》2024年第1期52-62,共11页Journal of Astronautics

基　　金：国家自然科学基金(62273118,61876050,61973100)。

摘　　要：利用强化学习技术,本文提出了一种超参数自适应的燃料最优地球同步轨道(GEO)航天器交会变轨策略优化方法。首先,建立了GEO航天器交会Lambert变轨模型。以变轨时刻为决策变量、燃料消耗为适应度函数,使用改进式综合学习粒子群算法(ICLPSO)作为变轨策略优化的基础方法。其次,考虑到求解的最优性和快速性,重新设计了以粒子群算法(PSO)优化结果为参考基线的奖励函数。使用一族典型GEO航天器交会工况训练深度确定性策略梯度神经网络(DDPG)。将DDPG与ICLPSO组合为强化学习粒子群算法(RLPSO),从而实现算法超参数根据实时迭代收敛情况的自适应动态调整。最后,仿真结果表明与PSO、综合学习粒子群算法(CLPSO)相比,RLPSO在较少迭代后即可给出适应度较高的规划结果,减轻了迭代过程中的计算资源消耗。Based on reinforcement learning(RL),an optimization method of rendezvous and orbit change strategy for fuel optimal geosynchronous orbit(GEO)spacecrafts with hyperparameter adaptation is proposed.Firstly,a GEO spacecraft rendezvous Lambert trajectory model is established.Taking the trajectory time as the decision variable and fuel consumption as the fitness function,an improved comprehensive learning particle swarm algorithm(ICLPSO)is used as the basic method for trajectory strategy optimization.Secondly,considering the optimality and rapidity of the solution,a reward function is redesigned with the particle swarm algorithm(PSO)optimization result as the reference baseline.A deep deterministic policy gradient neural network(DDPG)is trained using a typical family of GEO spacecraft rendezvous conditions.DDPG is combined with ICLPSO to form a reinforcement learning particle swarm algorithm(RLPSO),which realizes the adaptive dynamic adjustment of algorithm hyperparameters according to the real-time iterative convergence situation.Finally,simulation results show that compared with PSO and comprehensive learning particle swarm algorithm(CLPSO),RLPSO can give planning results with higher fitness after fewer iterations,reducing computational resource consumption during the iteration process.

关键词：地球同步轨道 Lambert变轨强化学习粒子群算法深度确定性策略梯度

分类号：V448.2[航空宇航科学与技术—飞行器设计]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种超参数自适应航天器交会变轨策略优化方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种超参数自适应航天器交会变轨策略优化方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种超参数自适应航天器交会变轨策略优化方法被引量：1