检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马悦 吴琳[3] 许霄 MA Yue;WU Lin;XU Xiao(Graduate School,National Defense University,Beijing 100091,China;Unit 31002 of the PLA,Beijing 100091,China;Academy of Joint Operation,National Defense University,Beijing 100091,China)
机构地区:[1]国防大学研究生院,北京100091 [2]中国人民解放军31002部队,北京100091 [3]国防大学联合作战学院,北京100091
出 处:《系统工程与电子技术》2023年第9期2793-2801,共9页Systems Engineering and Electronics
摘 要:针对传统方法难以适用于动态不确定环境下的大规模协同目标分配问题,提出一种基于多智能体强化学习的协同目标分配模型及训练方法。通过对相关概念和数学模型的描述,将协同目标分配转化为多智能体协作问题。聚焦于顶层分配策略的学习,构建了策略评分模型和策略推理模型,采用Advantage Actor-Critic算法进行策略优化。仿真实验结果表明,所提方法能够准确刻画作战单元之间的协同演化内因,有效地实现了大规模协同目标分配方案的动态生成。Aiming at the problem that traditional methods are difficult to apply to large-scale cooperative targets assignment in dynamic uncertain environment,a cooperative targets assignment model and training method based on multi-agent reinforcement learning is proposed.Through the description of related concepts and mathematical models,the cooperative targets assignment is transformed into a multi-agent cooperation problem.Focusing on the learning of top-level assignment strategy,the scoring model and reasoning model of strategy are constructed,and the Advantage Actor-Critic algorithm is used for strategy optimization.The simulation results show that the proposed method can accurately describe the evolution of the cooperative relationship between operational units,and effectively realize the dynamic generation of large-scale cooperative targets assignment scheme.
关 键 词:协同目标分配 多智能体协作 强化学习 神经网络 Advantage Actor-Critic
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.70