检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴凯峰 刘磊[1] 刘晨[1] 梁成庆 WU Kaifeng;LIU Lei;LIU Chen;LIANG Chengqing(School of Mathematics,Hohai University,Nanjing 211100,Jiangsu,China)
出 处:《计算机工程》2025年第5期73-82,共10页Computer Engineering
基 金:河北省自然科学基金面上项目(A2023209002)。
摘 要:多智能体深度确定性梯度(MADDPG)算法由深度确定性策略梯度(DDPG)算法扩展而来,专门针对多智能体环境设计,算法中每个智能体不仅考虑自身的观察和行动,还考虑其他智能体的策略,以更好地进行集体决策,这种设计显著提升了其在复杂、多变的环境中的性能和稳定性。基于MADDPG算法框架,设计算法的网络结构、状态空间、动作空间和奖励函数,实现无人机编队控制。为解决多智能体算法收敛困难的问题,训练过程中使用课程强化学习将任务进行阶段分解,针对每次任务不同,设计层次递进的奖励函数,并使用人工势场思想设计稠密奖励,使得训练难度大大降低。在自主搭建的软件在环(SITL)仿真环境中,通过消融、对照实验,验证了MADDPG算法在多智能体环境中的有效性和稳定性。最后进行实机实验,在现实环境中进一步验证了所设计算法的实用性。The Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm is an extension of the Deep Deterministic Policy Gradient(DDPG)algorithm,specifically designed for multi-agent environments.In the MADDPG algorithm,each agent considers not only its own observations and actions but also the strategies of other agents to make more accurate collective decisions.This design significantly improves performance and stability in complex and changing environments.Based on the MADDPG algorithm framework,this study addressed the problem of Unmanned Aerial Vehicle(UAV)formation control.To overcome the challenge of convergence difficulty in multi-agent algorithms,a curriculum reinforcement learning approach was employed to train tasks in a stagewise manner.Progressively enhanced reward functions were designed for different tasks of each stage,and dense rewards were devised using the artificial potential field concept to significantly reduce the training difficulty.The effectiveness and stability of the MADDPG algorithm in multi-agent environments were demonstrated through ablation and control experiments performed in a self-built Software in the Loop(SITL)simulation environment.Furthermore,real-world experiments were conducted to verify the practicality of the designed algorithm.
关 键 词:无人机编队 深度强化学习 多智能体深度确定性策略梯度 课程学习 神经网络
分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38