检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:苏牧青 王寅 濮锐敏[1] 余萌 SU Muqing;WANG Yin;PU Ruimin;YU Meng(College of Astronautics,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;State Key Laboratory of Mechanics and Control for Aerospace Structures,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China)
机构地区:[1]南京航空航天大学航天学院,南京211106 [2]南京航空航天大学航空航天结构力学及控制全国重点实验室,南京210016
出 处:《工程科学学报》2024年第7期1237-1250,共14页Chinese Journal of Engineering
基 金:航空科学基金资助项目(ASFC-20175152);南京航空航天大学实验技术研究与开发课题资助项目(SYJS202311Z)。
摘 要:本文面向无人车协同围捕问题开展研究,提出了一种基于柔性执行者-评论家(SAC)算法框架的协同围捕算法.针对多无人车之间的协同性差的问题,在网络结构中加入长短期记忆(LSTM)构建记忆功能,帮助无人车利用历史观测序列进行更稳健的决策;针对网络结构中引入LSTM所导致的状态空间维度增大、效率低的问题,提出引入注意力机制,通过对状态空间进行注意力权重的计算和选择,将注意力集中在与任务相关的关键状态上,从而约束状态空间维度并保证网络的稳定性,实现多无人车之间稳定高效的合作并提高算法的训练效率.为解决协同围捕任务中奖励稀疏的问题,提出通过混合奖励函数将奖励函数分为个体奖励和协同奖励,通过引入个体奖励和协同奖励,无人车在围捕过程中可以获得更频繁的奖励信号.个体奖励通过引导无人车向目标靠近来激励其运动行为,而协同奖励则激励群体无人车共同完成围捕任务,从而进一步提高算法的收敛速度.最后,通过仿真和实验表明,该方法具有更快的收敛速度,相较于SAC算法,围捕时间缩短15.1%,成功率提升7.6%.Collaborative encirclement of multiple unmanned ground vehicles(UGVs)is a focal challenge in the realm of multiagent collaborative tasks,representing a fundamental issue in complex undertakings such as multiagent collaborative search and interception.Although optimization algorithms have yielded rich research outcomes in collaborative encirclement,challenges persist,including poor real-time computational efficiency and weak robustness.Reinforcement learning theory holds considerable promise for addressing multiagent sequential decision problems.This paper delves into the study of the collaborative encirclement of multiple UGVs based on deep reinforcement learning theory,focusing on the following key aspects:establishing a kinematic model for UGVs to describe the collaborative encirclement task,detailing the collaborative encirclement process,developing strategies for target UGV escape,and addressing challenges arising from the increasing number of UGVs,which results in a complex environment and issues such as algorithmic instability,dimension explosion,and poor convergence.This paper introduces a collaborative encirclement algorithm based on the soft actor–critic(SAC)framework.To address issues related to poor collaboration and weak generalization among multiple UGVs,long short-term memory is incorporated into the network structure,serving as a memory function for UGVs.This tactic aids in capturing and using information from historical observation sequences,effectively processing time–series data,making more accurate decisions,promoting mutual collaboration among UGVs,and enhancing system stability.To tackle the issue of increased state space dimensions and low training efficiency during collaborative encirclement,an attention mechanism is introduced to calculate and select attention weights in the state space,focusing attention on key states relevant to the task.This strategy helps constrain state space dimensions,ensuring network stability,achieving stable and efficient collaboration among multiple UGVs,and
关 键 词:无人车 协同围捕 柔性执行者-评论家算法 注意力机制 奖励函数设计
分 类 号:TG142.71[一般工业技术—材料科学与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.94