基于多智能体强化学习的协同目标分配  被引量:5

Cooperative targets assignment based on multi-agent reinforcement learning

在线阅读下载全文

作  者:马悦 吴琳[3] 许霄 MA Yue;WU Lin;XU Xiao(Graduate School,National Defense University,Beijing 100091,China;Unit 31002 of the PLA,Beijing 100091,China;Academy of Joint Operation,National Defense University,Beijing 100091,China)

机构地区:[1]国防大学研究生院,北京100091 [2]中国人民解放军31002部队,北京100091 [3]国防大学联合作战学院,北京100091

出  处:《系统工程与电子技术》2023年第9期2793-2801,共9页Systems Engineering and Electronics

摘  要:针对传统方法难以适用于动态不确定环境下的大规模协同目标分配问题,提出一种基于多智能体强化学习的协同目标分配模型及训练方法。通过对相关概念和数学模型的描述,将协同目标分配转化为多智能体协作问题。聚焦于顶层分配策略的学习,构建了策略评分模型和策略推理模型,采用Advantage Actor-Critic算法进行策略优化。仿真实验结果表明,所提方法能够准确刻画作战单元之间的协同演化内因,有效地实现了大规模协同目标分配方案的动态生成。Aiming at the problem that traditional methods are difficult to apply to large-scale cooperative targets assignment in dynamic uncertain environment,a cooperative targets assignment model and training method based on multi-agent reinforcement learning is proposed.Through the description of related concepts and mathematical models,the cooperative targets assignment is transformed into a multi-agent cooperation problem.Focusing on the learning of top-level assignment strategy,the scoring model and reasoning model of strategy are constructed,and the Advantage Actor-Critic algorithm is used for strategy optimization.The simulation results show that the proposed method can accurately describe the evolution of the cooperative relationship between operational units,and effectively realize the dynamic generation of large-scale cooperative targets assignment scheme.

关 键 词:协同目标分配 多智能体协作 强化学习 神经网络 Advantage Actor-Critic 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象