融合先验知识的异构多智能体强化学习算法研究  被引量:2

Heterogeneous multi-Agent reinforcement learning algorithm integrating Prior-knowledge

在线阅读下载全文

作  者:周佳炜 孙宇祥 薛宇凡 项祺 吴莹[1] 周献中[1] ZHOU Jiawei;SUN Yuxiang;XUE Yufan;XIANG Qi;WU Ying;ZHOU Xianzhong(Nanjing University,Nanjing 210093,China)

机构地区:[1]南京大学,江苏南京210093

出  处:《指挥控制与仿真》2023年第3期99-107,共9页Command Control & Simulation

摘  要:近年来,基于深度强化学习的机器学习技术突破性进展为智能博弈对抗提供了新的技术发展方向。针对智能对抗中异构多智能体强化学习算法训练收敛速度慢,训练效果差异大等问题,提出了一种先验知识驱动的多智能体强化学习博弈对抗算法PK-MADDPG,构建了双重Critic框架下的MADDPG模型。该模型使用了经验优先回放技术来优化先验知识提取,在博弈对抗训练中取得显著的效果。论文成果应用于MaCA异构多智能体博弈对抗全国竞赛,将PK-MADDPG算法与经典规则算法的博弈对抗结果进行比较,验证了所提算法的有效性。In recent years,the breakthrough of machine learning based on deep reinforcement learning provides a new development direction for intelligent game confrontation.In order to solve the problems of slow convergence speed and great difference in training effect of heterogeneous multi-agent reinforcement learning algorithm in intelligent confrontation,this paper proposes a priori knowledge-driven multi-agent reinforcement learning game antagonism algorithm PK-MADDPG,and constructs a MADDPG model under the framework of double Critic.The model uses the experience first replay technique to optimize the prior knowledge extraction,thus achieving remarkable results in the training of game confrontation.In the national competition of MaCA heterogeneous multi-agent game confrontation,the paper compares the game confrontation results of PK-MADDPG algorithm with classical rule algorithm,and verifies the effectiveness of the algorithm proposed in this paper.

关 键 词:强化学习 智能博弈 智能兵棋 MADDPG 多智能体协同 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象