基于强化学习的多无人车协同围捕方法被引量：1

Cooperative encirclement method for multiple unmanned ground vehicles based on reinforcement learning

作　　者：苏牧青王寅濮锐敏[1] 余萌 SU Muqing;WANG Yin;PU Ruimin;YU Meng(College of Astronautics,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;State Key Laboratory of Mechanics and Control for Aerospace Structures,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China)

机构地区：[1]南京航空航天大学航天学院,南京211106 [2]南京航空航天大学航空航天结构力学及控制全国重点实验室,南京210016

出　　处：《工程科学学报》2024年第7期1237-1250,共14页Chinese Journal of Engineering

基　　金：航空科学基金资助项目(ASFC-20175152);南京航空航天大学实验技术研究与开发课题资助项目(SYJS202311Z)。

摘　　要：本文面向无人车协同围捕问题开展研究,提出了一种基于柔性执行者-评论家(SAC)算法框架的协同围捕算法.针对多无人车之间的协同性差的问题,在网络结构中加入长短期记忆(LSTM)构建记忆功能,帮助无人车利用历史观测序列进行更稳健的决策;针对网络结构中引入LSTM所导致的状态空间维度增大、效率低的问题,提出引入注意力机制,通过对状态空间进行注意力权重的计算和选择,将注意力集中在与任务相关的关键状态上,从而约束状态空间维度并保证网络的稳定性,实现多无人车之间稳定高效的合作并提高算法的训练效率.为解决协同围捕任务中奖励稀疏的问题,提出通过混合奖励函数将奖励函数分为个体奖励和协同奖励,通过引入个体奖励和协同奖励,无人车在围捕过程中可以获得更频繁的奖励信号.个体奖励通过引导无人车向目标靠近来激励其运动行为,而协同奖励则激励群体无人车共同完成围捕任务,从而进一步提高算法的收敛速度.最后,通过仿真和实验表明,该方法具有更快的收敛速度,相较于SAC算法,围捕时间缩短15.1%,成功率提升7.6%.Collaborative encirclement of multiple unmanned ground vehicles(UGVs)is a focal challenge in the realm of multiagent collaborative tasks,representing a fundamental issue in complex undertakings such as multiagent collaborative search and interception.Although optimization algorithms have yielded rich research outcomes in collaborative encirclement,challenges persist,including poor real-time computational efficiency and weak robustness.Reinforcement learning theory holds considerable promise for addressing multiagent sequential decision problems.This paper delves into the study of the collaborative encirclement of multiple UGVs based on deep reinforcement learning theory,focusing on the following key aspects:establishing a kinematic model for UGVs to describe the collaborative encirclement task,detailing the collaborative encirclement process,developing strategies for target UGV escape,and addressing challenges arising from the increasing number of UGVs,which results in a complex environment and issues such as algorithmic instability,dimension explosion,and poor convergence.This paper introduces a collaborative encirclement algorithm based on the soft actor–critic(SAC)framework.To address issues related to poor collaboration and weak generalization among multiple UGVs,long short-term memory is incorporated into the network structure,serving as a memory function for UGVs.This tactic aids in capturing and using information from historical observation sequences,effectively processing time–series data,making more accurate decisions,promoting mutual collaboration among UGVs,and enhancing system stability.To tackle the issue of increased state space dimensions and low training efficiency during collaborative encirclement,an attention mechanism is introduced to calculate and select attention weights in the state space,focusing attention on key states relevant to the task.This strategy helps constrain state space dimensions,ensuring network stability,achieving stable and efficient collaboration among multiple UGVs,and

关键词：无人车协同围捕柔性执行者-评论家算法注意力机制奖励函数设计

分类号：TG142.71[一般工业技术—材料科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的多无人车协同围捕方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的多无人车协同围捕方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于强化学习的多无人车协同围捕方法被引量：1