基于分层强化学习的多智能体博弈策略生成方法  

Multi-agent Game Strategy Generation Method Based on Hierarchical Reinforcement Learning

在线阅读下载全文

作  者:畅鑫 李艳斌[1] 刘东辉[2,3] CHANG Xin;LI Yanbin;LIU Donghui(The 54th Research Institute of CETC,Shijiazhuang 050081,China;School of Management,Shijiazhuang Tiedao University,Shijiazhuang 050043,China;Research Institute of Engineering Management,Shijiazhuang Tiedao University,Shijiazhuang 050043,China)

机构地区:[1]中国电子科技集团公司第五十四研究所,河北石家庄050081 [2]石家庄铁道大学管理学院,河北石家庄050043 [3]石家庄铁道大学工程建设管理研究中心,河北石家庄050043

出  处:《无线电工程》2024年第6期1361-1367,共7页Radio Engineering

基  金:中国博士后科学基金(2021 M693002);国家自然科学基金(71991485,71991481,71991480)。

摘  要:典型基于深度强化学习的多智能体对抗策略生成方法采用“分总”框架,各智能体基于部分可观测信息生成策略并进行决策,缺乏从整体角度生成对抗策略的能力,大大限制了决策能力。为了解决该问题,基于分层强化学习提出改进的多智能体博弈策略生成方法。基于分层强化学习构建观测信息到整体价值的决策映射,以最大化整体价值作为目标构建优化问题,并推导了策略优化过程,为后续框架结构和方法实现的设计提供了理论依据;基于决策映射与优化问题构建,采用神经网络设计了模型框架,详细阐述了顶层策略控制模型和个体策略执行模型;基于策略优化方法,给出详细训练流程和算法流程;采用星际争霸多智能体对抗(StarCraft Multi-Agent Challenge,SMAC)环境,与典型多智能体方法进行性能对比。实验结果表明,该方法能够有效生成对抗策略,控制异构多智能体战胜预设对手策略,相比典型多智能体强化学习方法性能提升明显。In traditional multi-agent confrontation strategy generation method based on deep reinforcement learning,a“decentralized”framework is adopted,in which each agent generates strategies and makes decisions based on partial observable information,lacking the ability to generate confrontation strategy from the whole observable information and greatly limiting the decision-making ability.To address this disadvantage,an improved method for generating multi-agent game strategies based on hierarchical reinforcement learning is proposed.First,decision mapping from observation information to overall value is constructed based on hierarchical reinforcement learning,optimization problems are formulated with maximization of overall value as the objective,and the process of strategy optimization is derived,providing theoretical basis for the subsequent design of framework structure and method implementation.Then,based on the decision mapping and optimization problems,a model framework is designed using neural networks,and detailed explanations are provided for the top-level strategy control model and individual strategy execution model.Furthermore,detailed training processes and algorithm flows are presented based on strategy optimization method.Finally,the performance of the proposed method is compared with traditional multi-agent methods using StarCraft Multi-Agent Challenge(SMAC)environment.Experimental results demonstrate that the method effectively generates confrontation strategies,enabling heterogamous multi-agent systems to defeat preset opponent strategies,and the performance is significantly improved as compared to traditional multi-agent reinforcement learning method.

关 键 词:分层强化学习 多智能体博弈 深度神经网络 

分 类 号:TN929.5[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象