检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:畅鑫 李艳斌[1] 刘东辉[2,3] CHANG Xin;LI Yanbin;LIU Donghui(The 54th Research Institute of CETC,Shijiazhuang 050081,China;School of Management,Shijiazhuang Tiedao University,Shijiazhuang 050043,China;Research Institute of Engineering Management,Shijiazhuang Tiedao University,Shijiazhuang 050043,China)
机构地区:[1]中国电子科技集团公司第五十四研究所,河北石家庄050081 [2]石家庄铁道大学管理学院,河北石家庄050043 [3]石家庄铁道大学工程建设管理研究中心,河北石家庄050043
出 处:《无线电工程》2024年第6期1361-1367,共7页Radio Engineering
基 金:中国博士后科学基金(2021 M693002);国家自然科学基金(71991485,71991481,71991480)。
摘 要:典型基于深度强化学习的多智能体对抗策略生成方法采用“分总”框架,各智能体基于部分可观测信息生成策略并进行决策,缺乏从整体角度生成对抗策略的能力,大大限制了决策能力。为了解决该问题,基于分层强化学习提出改进的多智能体博弈策略生成方法。基于分层强化学习构建观测信息到整体价值的决策映射,以最大化整体价值作为目标构建优化问题,并推导了策略优化过程,为后续框架结构和方法实现的设计提供了理论依据;基于决策映射与优化问题构建,采用神经网络设计了模型框架,详细阐述了顶层策略控制模型和个体策略执行模型;基于策略优化方法,给出详细训练流程和算法流程;采用星际争霸多智能体对抗(StarCraft Multi-Agent Challenge,SMAC)环境,与典型多智能体方法进行性能对比。实验结果表明,该方法能够有效生成对抗策略,控制异构多智能体战胜预设对手策略,相比典型多智能体强化学习方法性能提升明显。In traditional multi-agent confrontation strategy generation method based on deep reinforcement learning,a“decentralized”framework is adopted,in which each agent generates strategies and makes decisions based on partial observable information,lacking the ability to generate confrontation strategy from the whole observable information and greatly limiting the decision-making ability.To address this disadvantage,an improved method for generating multi-agent game strategies based on hierarchical reinforcement learning is proposed.First,decision mapping from observation information to overall value is constructed based on hierarchical reinforcement learning,optimization problems are formulated with maximization of overall value as the objective,and the process of strategy optimization is derived,providing theoretical basis for the subsequent design of framework structure and method implementation.Then,based on the decision mapping and optimization problems,a model framework is designed using neural networks,and detailed explanations are provided for the top-level strategy control model and individual strategy execution model.Furthermore,detailed training processes and algorithm flows are presented based on strategy optimization method.Finally,the performance of the proposed method is compared with traditional multi-agent methods using StarCraft Multi-Agent Challenge(SMAC)environment.Experimental results demonstrate that the method effectively generates confrontation strategies,enabling heterogamous multi-agent systems to defeat preset opponent strategies,and the performance is significantly improved as compared to traditional multi-agent reinforcement learning method.
分 类 号:TN929.5[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7