检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵小虎 吴若诚 江涵立 ZHAO Xiaohu;WU Ruocheng;JIANG Hanli(China Academy of Electronics and Information Technology,Beijing 1300041,China;Zhejiang Petrochemical Trading Center,Zhoushan 316000,China;Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation,Hefei 241002,China)
机构地区:[1]中国电子科技集团有限公司电子科学研究院,北京100041 [2]浙江国际油气交易中心有限公司,浙江舟山316000 [3]安徽省网络空间安全态势感知与评估重点实验室,安徽合肥241002
出 处:《长春工业大学学报》2023年第6期552-562,共11页Journal of Changchun University of Technology
基 金:安徽省网络空间安全态势感知与评估重点实验室开放基金项目(CSSAE-2021-004)。
摘 要:城市高层消防一直是具有挑战性的问题,利用无人机来执行消防任务是一个有效的解决方案。在这项工作中,我们将城市高层消防问题表述为一个部分可观测的马尔可夫决策过程(POMDP),并提出一种带有β-变分自动编码器(β-VAE)的多智能体近端策略优化(MAPPO)算法来解决它。该算法基于Actor-Critic体系结构,采用包含全局信息的评论家网络和共享信息的行动者网络。β-VAE是处理视觉感知信息的有效手段,有助于深度强化学习(DRL),使无人机因接近火灾区域并成功完成消防任务而获得奖励。为了评估文中提出的方法,基于AirSim和UrbanScene3D构建了一个大规模复杂的城市火灾环境,并将文中算法与多智能体深度确定性策略梯度(MADDPG)进行比较。实验结果表明,MAPPO算法用来解决城市高层消防问题是有效的,并且明显优于MADDPG算法。Urban high-rise firefighting has been a challenging problem,where unmanned aerial vehicles(UAVs)is to provide an effective solution.In this work,we formulate the problem of urban high-rise firefighting as a Partially Observable Markov Decision Process(POMDP)and propose a multi agent proximal policy optimization(MAPPO)algorithm withβ-Variational auto-encoder(β-VAE)to solve it.MAPPO is a multi-agent extension of Proximal Policy Optimization(PPO)that allows agents to cooperate with each other.Based on Actor-Critic architecture,the algorithm employs a critic network containing global information and an actor network of shared information.β-VAE works as an efficient means to process visual perception information to help assist deep reinforcement learning(DRL).UAVs are rewarded for approaching the fire area and successfully completing firefighting tasks.To evaluate our proposed method,we build a large-scale complex urban fire environment based on AirSim and UrbanScene3D and compare our algorithm with multi-agent deep deterministic policy gradient(MADDPG).The results of our experiments demonstrate that MAPPO algorithm is effective in urban high-rise firefighting problem and is significantly better than MADDPG.
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.111.63