改进式MATD3算法及其对抗应用

Improved MATD3 algorithm and its adversarial application

作　　者：王琨赵英策王光耀[2] 李建勋[1] WANG Kun;ZHAO Yingce;WANG Guangyao;LI Jianxun(Shanghai Jiao Tong University Department of Automation,Shanghai 200240,China;Shenyang Aircraft Design and Research Institute,Shenyang 110035,China)

机构地区：[1]上海交通大学自动化系,上海200240 [2]沈阳飞机设计研究所,沈阳110035

出　　处：《指挥控制与仿真》2024年第5期77-84,共8页Command Control & Simulation

基　　金：国家自然科学基金(61673265);国家重点研发计划(2020YFC1512203);上海商用飞机系统工程联合研究基金(CASEF-2022-MQ01)。

摘　　要：提升多智能体训练效果一直是强化学习领域中的重点。以多智能体双延迟深度确定性策略梯度(MATD3)算法为基础,引入参数共享机制,进而提升训练效率。同时为缓解真实奖励与辅助奖励不一致的问题,借鉴课程学习思想,提出辅助奖励衰减因子,以保证训练初期的策略探索积极性与训练末期的奖励一致性。将所提出的改进式MATD3算法应用于战车博弈对抗,从而实现战车的智能决策,应用结果表明,智能战车的奖励曲线收敛稳定,且效果良好。同时就改进式算法与原始MATD3算法进行对比仿真,仿真结果验证了改进式算法能够有效提升收敛速度以及奖励收敛值。Improving the training effect of multi-agent has always been the focus in the field of reinforcement learning.Based on the multi-Agent twin-delay deep deterministic policy gradient(MATD3)algorithm,a parameter sharing mechanism is introduced to improve training efficiency.At the same time,in order to alleviate the inconsistency between real rewards and auxiliary rewards,drawing on the ideas of course learning,a decay factor for auxiliary rewards is proposed to ensure the motivation of policy exploration in the early training period and the reward consistency in the late training period.And the proposed improved MATD3 algorithm is applied to combat vehicle games to achieve intelligent decision-making of the vehicle.The application results show that the reward curve of the vehicle converges stably and the effect is good.Besides,the improved algorithm is compared with the original MATD3 algorithm,and the simulation results verify that the improved algorithm can effectively improve the effect of convergence and the convergence value of reward.

关键词：强化学习参数共享奖励一致性智能决策

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

改进式MATD3算法及其对抗应用

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

改进式MATD3算法及其对抗应用

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索