基于改进多智能体PPO的多无人机协同探索方法被引量：2

A Multi-UAV Cooperative Exploration Method Based on Improved Multi-Agent PPO

作　　者：安城安周思达 AN Cheng’an;ZHOU Sida(School of Electrical Information Engineering,Yunnan Minzu University,Kunming 650000,China)

机构地区：[1]云南民族大学电气信息工程学院,昆明650000

出　　处：《电光与控制》2024年第1期51-56,共6页Electronics Optics & Control

基　　金：国家自然科学基金(61963038)。

摘　　要：采用多无人机对未知环境进行探索,可以提高探索任务的鲁棒性和执行效率。不同于启发式方法,多智能体深度强化学习方法可以省去人为制定规则的过程,将无人机作为智能体,通过与环境互动,自主习得更加有效的“规则”。搭建了多无人机多线程仿真环境,为多无人机协同训练提供环境,提出一种适应多线程环境的结合长短时循环神经网络(记忆)的共享多智能体近端策略优化(LSTM-MAPPO)方法,并在合作型LSTM-MAPPO方法的基础上增加了全局边界信息以增大每幕探索面积。数值实验结果表明:与现有的多智能体深度确定性策略梯度(MADDPG)方法相比,所提方法在训练后期连续动作下也能稳定收敛;相较于现有的LSTM-MAPPO方法,其最终获得的奖励稳定高于5000;对3种不同的仿真地图,训练完的网络在测试时能实现70%以上的稳定探索面积。Using multiple UAVs to explore unknown environments can improve the robustness and execution efficiency of exploration tasks.Different from the heuristic method,the multi-agent deep reinforcement learning method eliminates the process of making rules artificially,and takes the UAVs as agents to independently learn more effective“rules”by interacting with the environment.A multi-threaded simulation environment for multiple UAVs is built to provide an environment for cooperative training of multiple UAVs.A Long and Short Term Memory neural network-based shared Multi-Agent Proximal Policy Optimization(LSTM-MAPPO)method is proposed to adapt to the multi-threaded environment,and the global boundary information is added on the basis of the cooperative LSTM-MAPPO method to increase the exploration area of each episode.The numerical experiment results show that:1)Compared with the existing Multi-Agent Depth Deterministic Policy Gradient(MADDPG)method,it can converge stably in later periods of training under the continuous action;2)Compared with the existing LSTM-MAPPO method,its final reward is stably above 5000;and 3)On three different simulation maps,the trained network can realize the stable exploration of more than 70%of the area during the test.

关键词：多无人机协同多智能体深度强化学习未知环境探索航迹规划多线程技术长短时循环神经网络

分类号：V279[航空宇航科学与技术—飞行器设计]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进多智能体PPO的多无人机协同探索方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进多智能体PPO的多无人机协同探索方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于改进多智能体PPO的多无人机协同探索方法被引量：2