检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:付可 陈浩 王宇 刘权 黄健[1] FU Ke;CHEN Hao;WANG Yu;LIU Quan;HUANG Jian(College of Intelligence Science and Technology,National University of Defense Technology,Changsha 410073,China)
机构地区:[1]国防科技大学智能科学学院,湖南长沙410073
出 处:《系统工程与电子技术》2025年第2期535-543,共9页Systems Engineering and Electronics
摘 要:针对多智能体对抗中因对手策略变化导致的非平稳性问题,在对手动作不可获取的限制下,提出一种基于不确定性的贝叶斯策略重用算法。在离线阶段,在策略学习的同时,通过自编码器建模智能体轨迹与对手动作之间的关系表征以构建对手模型。在在线阶段,依据对手模型和有限交互信息,估计对手策略类型的不确定性,并基于此选择最优应对策略并重用。最后,在两种对抗场景下的实验结果表明所提算法相比3种先进的基线方法识别精度更高,且识别速度更快。To solve the non-stationarity problem caused by opponent policy changes in multi-agent competitions,this paper proposes an algorithm called uncertainty-based Bayesian policy reuse under the restriction of unavailability of the online opponent’s actions.In the offline phase,use an autoencoder to model the relationship representation between agent trajectories and the opponent actions during policy learning.In the online phase,the agent evaluates the uncertainty of the opponent type only conditioning on limited interaction information and the built opponent models.Afterward,optimal response policy is selected for execution.The proposed algorithm on two scenarios and demonstrate that it has higher recognition accuracy and faster speed than three state-of-the-art baseline methods.
关 键 词:多智能体对抗 贝叶斯策略重用 强化学习 关系表征
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222