基于多智能体强化学习的无人艇集群集结方法被引量：3

A coordinated rendezvous method for unmanned surface vehicle swarms based on multi-agent reinforcement learning

作　　者：夏家伟刘志坤朱旭芳[3] 刘忠[1] XIA Jiawei;LIU Zhikun;ZHU Xufang;LIU Zhong(School of Weaponry Engineering,Naval University of Engineering,Wuhan 430033,China;Qingdao campus,Naval Aviation University,Qingdao 266014,China;School of Electronic Engineering,Naval University of Engineering,Wuhan 430033,China)

机构地区：[1]海军工程大学兵器工程学院,武汉430033 [2]海军航空大学青岛校区,青岛266014 [3]海军工程大学电子工程学院,武汉430033

出　　处：《北京航空航天大学学报》2023年第12期3365-3376,共12页Journal of Beijing University of Aeronautics and Astronautics

基　　金：中国博士后基金(2016T45686);湖北省自然科学基金(2018CFC865)。

摘　　要：为解决数量不定的同构水面无人艇(USV)集群以期望队形协同集结的问题,提出一种基于多智能体强化学习(MARL)的分布式集群集结控制方法。针对USV通信感知能力约束,建立集群的动态交互图,通过引入二维网格状态特征编码的方法,构建维度不变的智能体观测空间;采用集中式训练和分布式执行的多智能体近端策略优化(MAPPO)强化学习架构,分别设计策略网络和价值网络的状态空间和动作空间,定义收益函数;构建编队集结仿真环境,经过训练,所提方法能有效收敛。仿真结果表明:所提方法在不同期望队形、不同集群数量和部分智能体失效等场景中,均能成功实现快速集结,其灵活性和鲁棒性得到验证。To address the challenge of rendezvousing an indeterminate number of homogeneous unmanned surface vehicles(USV)into desired formations,a distributed rendezvousing control method is introduced,leveraging multi-agent reinforcement learning(MARL).Recognizing the communication and perception constraints inherent to USVs,a dynamic interaction graph for the swarm is crafted.By adopting a two-dimensional grid encoding methodology,a consistent-dimensional observation space for each agent is generated.Within the multi-agent proximal policy optimization(MAPPO)framework,which incorporates centralized training and distributed execution,the state and action spaces for both the policy and value networks are distinctly designed,and a reward function is articulated.Upon the construction of a simulated environment for USV swarm rendezvous,it is highlighted in our results that the method achieves effective convergence post-training.In scenarios encompassing varying desired formations,differing swarm sizes,and partial agent failures,swift rendezvous is consistently ensured by proposed method,underlining its flexibility and robustness.

关键词：无人艇集群系统多智能体强化学习深度强化学习集结方法近端策略优化

分类号：U664.82[交通运输工程—船舶及航道工程] TP18[交通运输工程—船舶与海洋工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多智能体强化学习的无人艇集群集结方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多智能体强化学习的无人艇集群集结方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于多智能体强化学习的无人艇集群集结方法被引量：3