检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:安城安 周思达 AN Cheng’an;ZHOU Sida(School of Electrical Information Engineering,Yunnan Minzu University,Kunming 650000,China)
机构地区:[1]云南民族大学电气信息工程学院,昆明650000
出 处:《电光与控制》2024年第1期51-56,共6页Electronics Optics & Control
基 金:国家自然科学基金(61963038)。
摘 要:采用多无人机对未知环境进行探索,可以提高探索任务的鲁棒性和执行效率。不同于启发式方法,多智能体深度强化学习方法可以省去人为制定规则的过程,将无人机作为智能体,通过与环境互动,自主习得更加有效的“规则”。搭建了多无人机多线程仿真环境,为多无人机协同训练提供环境,提出一种适应多线程环境的结合长短时循环神经网络(记忆)的共享多智能体近端策略优化(LSTM-MAPPO)方法,并在合作型LSTM-MAPPO方法的基础上增加了全局边界信息以增大每幕探索面积。数值实验结果表明:与现有的多智能体深度确定性策略梯度(MADDPG)方法相比,所提方法在训练后期连续动作下也能稳定收敛;相较于现有的LSTM-MAPPO方法,其最终获得的奖励稳定高于5000;对3种不同的仿真地图,训练完的网络在测试时能实现70%以上的稳定探索面积。Using multiple UAVs to explore unknown environments can improve the robustness and execution efficiency of exploration tasks.Different from the heuristic method,the multi-agent deep reinforcement learning method eliminates the process of making rules artificially,and takes the UAVs as agents to independently learn more effective“rules”by interacting with the environment.A multi-threaded simulation environment for multiple UAVs is built to provide an environment for cooperative training of multiple UAVs.A Long and Short Term Memory neural network-based shared Multi-Agent Proximal Policy Optimization(LSTM-MAPPO)method is proposed to adapt to the multi-threaded environment,and the global boundary information is added on the basis of the cooperative LSTM-MAPPO method to increase the exploration area of each episode.The numerical experiment results show that:1)Compared with the existing Multi-Agent Depth Deterministic Policy Gradient(MADDPG)method,it can converge stably in later periods of training under the continuous action;2)Compared with the existing LSTM-MAPPO method,its final reward is stably above 5000;and 3)On three different simulation maps,the trained network can realize the stable exploration of more than 70%of the area during the test.
关 键 词:多无人机协同 多智能体深度强化学习 未知环境探索 航迹规划 多线程技术 长短时循环神经网络
分 类 号:V279[航空宇航科学与技术—飞行器设计]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.0.35