基于融合课程思想MADDPG的无人机编队控制

Unmanned Aerial Vehicle Formation Control Based on MADDPG with Integrated Curriculum Learning

作　　者：吴凯峰刘磊[1] 刘晨[1] 梁成庆 WU Kaifeng;LIU Lei;LIU Chen;LIANG Chengqing(School of Mathematics,Hohai University,Nanjing 211100,Jiangsu,China)

机构地区：[1]河海大学数学学院,江苏南京211100

出　　处：《计算机工程》2025年第5期73-82,共10页Computer Engineering

基　　金：河北省自然科学基金面上项目(A2023209002)。

摘　　要：多智能体深度确定性梯度(MADDPG)算法由深度确定性策略梯度(DDPG)算法扩展而来,专门针对多智能体环境设计,算法中每个智能体不仅考虑自身的观察和行动,还考虑其他智能体的策略,以更好地进行集体决策,这种设计显著提升了其在复杂、多变的环境中的性能和稳定性。基于MADDPG算法框架,设计算法的网络结构、状态空间、动作空间和奖励函数,实现无人机编队控制。为解决多智能体算法收敛困难的问题,训练过程中使用课程强化学习将任务进行阶段分解,针对每次任务不同,设计层次递进的奖励函数,并使用人工势场思想设计稠密奖励,使得训练难度大大降低。在自主搭建的软件在环(SITL)仿真环境中,通过消融、对照实验,验证了MADDPG算法在多智能体环境中的有效性和稳定性。最后进行实机实验,在现实环境中进一步验证了所设计算法的实用性。The Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm is an extension of the Deep Deterministic Policy Gradient(DDPG)algorithm,specifically designed for multi-agent environments.In the MADDPG algorithm,each agent considers not only its own observations and actions but also the strategies of other agents to make more accurate collective decisions.This design significantly improves performance and stability in complex and changing environments.Based on the MADDPG algorithm framework,this study addressed the problem of Unmanned Aerial Vehicle(UAV)formation control.To overcome the challenge of convergence difficulty in multi-agent algorithms,a curriculum reinforcement learning approach was employed to train tasks in a stagewise manner.Progressively enhanced reward functions were designed for different tasks of each stage,and dense rewards were devised using the artificial potential field concept to significantly reduce the training difficulty.The effectiveness and stability of the MADDPG algorithm in multi-agent environments were demonstrated through ablation and control experiments performed in a self-built Software in the Loop(SITL)simulation environment.Furthermore,real-world experiments were conducted to verify the practicality of the designed algorithm.

关键词：无人机编队深度强化学习多智能体深度确定性策略梯度课程学习神经网络

分类号：TP273[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于融合课程思想MADDPG的无人机编队控制

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于融合课程思想MADDPG的无人机编队控制

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索