检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周佳炜 孙宇祥 薛宇凡 项祺 吴莹[1] 周献中[1] ZHOU Jiawei;SUN Yuxiang;XUE Yufan;XIANG Qi;WU Ying;ZHOU Xianzhong(Nanjing University,Nanjing 210093,China)
机构地区:[1]南京大学,江苏南京210093
出 处:《指挥控制与仿真》2023年第3期99-107,共9页Command Control & Simulation
摘 要:近年来,基于深度强化学习的机器学习技术突破性进展为智能博弈对抗提供了新的技术发展方向。针对智能对抗中异构多智能体强化学习算法训练收敛速度慢,训练效果差异大等问题,提出了一种先验知识驱动的多智能体强化学习博弈对抗算法PK-MADDPG,构建了双重Critic框架下的MADDPG模型。该模型使用了经验优先回放技术来优化先验知识提取,在博弈对抗训练中取得显著的效果。论文成果应用于MaCA异构多智能体博弈对抗全国竞赛,将PK-MADDPG算法与经典规则算法的博弈对抗结果进行比较,验证了所提算法的有效性。In recent years,the breakthrough of machine learning based on deep reinforcement learning provides a new development direction for intelligent game confrontation.In order to solve the problems of slow convergence speed and great difference in training effect of heterogeneous multi-agent reinforcement learning algorithm in intelligent confrontation,this paper proposes a priori knowledge-driven multi-agent reinforcement learning game antagonism algorithm PK-MADDPG,and constructs a MADDPG model under the framework of double Critic.The model uses the experience first replay technique to optimize the prior knowledge extraction,thus achieving remarkable results in the training of game confrontation.In the national competition of MaCA heterogeneous multi-agent game confrontation,the paper compares the game confrontation results of PK-MADDPG algorithm with classical rule algorithm,and verifies the effectiveness of the algorithm proposed in this paper.
关 键 词:强化学习 智能博弈 智能兵棋 MADDPG 多智能体协同
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.31