检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙雷翔 郭延宁[2] 邓武东[3] 吕跃勇[2] 马广富[2] SUN Leixiang;GUO Yanning;DENG Wudong;LYU Yueyong;MA Guangfu(Institute of Space Science and Applied Technology,Harbin Institute of Technology(Shenzhen),Shenzhen 518055,China;School of Astronautics,Harbin Institute of Technology,Harbin 150001,China;Shanghai Institute of Satellite Engineering,Shanghai 201109,China)
机构地区:[1]哈尔滨工业大学(深圳)空间科学与应用技术研究院,深圳518055 [2]哈尔滨工业大学航天学院,哈尔滨150001 [3]上海卫星工程研究所,上海201109
出 处:《宇航学报》2024年第1期52-62,共11页Journal of Astronautics
基 金:国家自然科学基金(62273118,61876050,61973100)。
摘 要:利用强化学习技术,本文提出了一种超参数自适应的燃料最优地球同步轨道(GEO)航天器交会变轨策略优化方法。首先,建立了GEO航天器交会Lambert变轨模型。以变轨时刻为决策变量、燃料消耗为适应度函数,使用改进式综合学习粒子群算法(ICLPSO)作为变轨策略优化的基础方法。其次,考虑到求解的最优性和快速性,重新设计了以粒子群算法(PSO)优化结果为参考基线的奖励函数。使用一族典型GEO航天器交会工况训练深度确定性策略梯度神经网络(DDPG)。将DDPG与ICLPSO组合为强化学习粒子群算法(RLPSO),从而实现算法超参数根据实时迭代收敛情况的自适应动态调整。最后,仿真结果表明与PSO、综合学习粒子群算法(CLPSO)相比,RLPSO在较少迭代后即可给出适应度较高的规划结果,减轻了迭代过程中的计算资源消耗。Based on reinforcement learning(RL),an optimization method of rendezvous and orbit change strategy for fuel optimal geosynchronous orbit(GEO)spacecrafts with hyperparameter adaptation is proposed.Firstly,a GEO spacecraft rendezvous Lambert trajectory model is established.Taking the trajectory time as the decision variable and fuel consumption as the fitness function,an improved comprehensive learning particle swarm algorithm(ICLPSO)is used as the basic method for trajectory strategy optimization.Secondly,considering the optimality and rapidity of the solution,a reward function is redesigned with the particle swarm algorithm(PSO)optimization result as the reference baseline.A deep deterministic policy gradient neural network(DDPG)is trained using a typical family of GEO spacecraft rendezvous conditions.DDPG is combined with ICLPSO to form a reinforcement learning particle swarm algorithm(RLPSO),which realizes the adaptive dynamic adjustment of algorithm hyperparameters according to the real-time iterative convergence situation.Finally,simulation results show that compared with PSO and comprehensive learning particle swarm algorithm(CLPSO),RLPSO can give planning results with higher fitness after fewer iterations,reducing computational resource consumption during the iteration process.
关 键 词:地球同步轨道 Lambert变轨 强化学习 粒子群算法 深度确定性策略梯度
分 类 号:V448.2[航空宇航科学与技术—飞行器设计]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.129.206.232