改进型深度确定性策略梯度的无人机路径规划  

UAV Path Planning Based on Improved Deep Deterministic Policy Gradients

在线阅读下载全文

作  者:张森 代强强 Zhang Sen;Dai Qiangqiang(College of Information Engineering,Henan University of Science and Technology,Luoyang 471023,China)

机构地区:[1]河南科技大学信息工程学院,河南洛阳471023

出  处:《系统仿真学报》2025年第4期875-881,共7页Journal of System Simulation

基  金:国家自然科学基金(62271193,61304144);河南省自然科学基金(222300420433)。

摘  要:针对无人机在复杂环境下进行路径规划时,存在收敛性差和无效探索等问题,提出一种改进型深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法。采用双经验池机制,分别存储成功经验和失败经验,算法能够利用成功经验强化策略优化,并从失败经验中学习避免错误路径;引入人工势场法为规划增加引导项,与随机采样过程中的探索噪声动作相结合,对所选动作进行动态整合;通过设计组合奖励函数,采用方向、距离、障碍躲避及时间奖励函数实现路径规划的多目标优化,并解决奖励稀疏问题。实验结果表明:该算法的奖励和成功率能够得到显著提高,且能够在更短的时间内达到收敛。Aiming at the problems of poor convergence and invalid exploration when UAVs perform path planning in complex environments,an improved deep deterministic policy gradient(DDPG)algorithm is proposed.Using a dual experience pooling mechanism to store success and failure experiences separately,the algorithm is able to use the success experience to strengthen the strategy optimization and learn from the failure experience to avoid the wrong path;an APF method is introduced to add a bootstrap term to the planning,which is combined with the exploration of noisy actions in a randomized sampling process to dynamically integrate the selected actions;multi-objective optimization of path planning is achieved by designing combinatorial reward functions using direction,distance,obstacle avoidance and time reward functions and solving the reward sparsity problem.Experiments show that the proposed algorithm can significantly improve the reward and success rate and reach convergence in a shorter time.

关 键 词:无人机 深度强化学习 路径规划 深度确定性策略梯度 人工势场法 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象