基于多智能体近端策略优化的无人机城市高层消防  被引量:1

UAV urban high-rise firefighting based on multi agent proximal policy optimization

在线阅读下载全文

作  者:赵小虎 吴若诚 江涵立 ZHAO Xiaohu;WU Ruocheng;JIANG Hanli(China Academy of Electronics and Information Technology,Beijing 1300041,China;Zhejiang Petrochemical Trading Center,Zhoushan 316000,China;Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation,Hefei 241002,China)

机构地区:[1]中国电子科技集团有限公司电子科学研究院,北京100041 [2]浙江国际油气交易中心有限公司,浙江舟山316000 [3]安徽省网络空间安全态势感知与评估重点实验室,安徽合肥241002

出  处:《长春工业大学学报》2023年第6期552-562,共11页Journal of Changchun University of Technology

基  金:安徽省网络空间安全态势感知与评估重点实验室开放基金项目(CSSAE-2021-004)。

摘  要:城市高层消防一直是具有挑战性的问题,利用无人机来执行消防任务是一个有效的解决方案。在这项工作中,我们将城市高层消防问题表述为一个部分可观测的马尔可夫决策过程(POMDP),并提出一种带有β-变分自动编码器(β-VAE)的多智能体近端策略优化(MAPPO)算法来解决它。该算法基于Actor-Critic体系结构,采用包含全局信息的评论家网络和共享信息的行动者网络。β-VAE是处理视觉感知信息的有效手段,有助于深度强化学习(DRL),使无人机因接近火灾区域并成功完成消防任务而获得奖励。为了评估文中提出的方法,基于AirSim和UrbanScene3D构建了一个大规模复杂的城市火灾环境,并将文中算法与多智能体深度确定性策略梯度(MADDPG)进行比较。实验结果表明,MAPPO算法用来解决城市高层消防问题是有效的,并且明显优于MADDPG算法。Urban high-rise firefighting has been a challenging problem,where unmanned aerial vehicles(UAVs)is to provide an effective solution.In this work,we formulate the problem of urban high-rise firefighting as a Partially Observable Markov Decision Process(POMDP)and propose a multi agent proximal policy optimization(MAPPO)algorithm withβ-Variational auto-encoder(β-VAE)to solve it.MAPPO is a multi-agent extension of Proximal Policy Optimization(PPO)that allows agents to cooperate with each other.Based on Actor-Critic architecture,the algorithm employs a critic network containing global information and an actor network of shared information.β-VAE works as an efficient means to process visual perception information to help assist deep reinforcement learning(DRL).UAVs are rewarded for approaching the fire area and successfully completing firefighting tasks.To evaluate our proposed method,we build a large-scale complex urban fire environment based on AirSim and UrbanScene3D and compare our algorithm with multi-agent deep deterministic policy gradient(MADDPG).The results of our experiments demonstrate that MAPPO algorithm is effective in urban high-rise firefighting problem and is significantly better than MADDPG.

关 键 词:无人机导航 深度强化学习 多智能体协作 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象