基于不确定性的贝叶斯策略重用方法

Uncertainty-based Bayesian policy reuse method

作　　者：付可陈浩王宇刘权黄健[1] FU Ke;CHEN Hao;WANG Yu;LIU Quan;HUANG Jian(College of Intelligence Science and Technology,National University of Defense Technology,Changsha 410073,China)

机构地区：[1]国防科技大学智能科学学院,湖南长沙410073

出　　处：《系统工程与电子技术》2025年第2期535-543,共9页Systems Engineering and Electronics

摘　　要：针对多智能体对抗中因对手策略变化导致的非平稳性问题,在对手动作不可获取的限制下,提出一种基于不确定性的贝叶斯策略重用算法。在离线阶段,在策略学习的同时,通过自编码器建模智能体轨迹与对手动作之间的关系表征以构建对手模型。在在线阶段,依据对手模型和有限交互信息,估计对手策略类型的不确定性,并基于此选择最优应对策略并重用。最后,在两种对抗场景下的实验结果表明所提算法相比3种先进的基线方法识别精度更高,且识别速度更快。To solve the non-stationarity problem caused by opponent policy changes in multi-agent competitions,this paper proposes an algorithm called uncertainty-based Bayesian policy reuse under the restriction of unavailability of the online opponent’s actions.In the offline phase,use an autoencoder to model the relationship representation between agent trajectories and the opponent actions during policy learning.In the online phase,the agent evaluates the uncertainty of the opponent type only conditioning on limited interaction information and the built opponent models.Afterward,optimal response policy is selected for execution.The proposed algorithm on two scenarios and demonstrate that it has higher recognition accuracy and faster speed than three state-of-the-art baseline methods.

关键词：多智能体对抗贝叶斯策略重用强化学习关系表征

分类号：TP301.6[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于不确定性的贝叶斯策略重用方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于不确定性的贝叶斯策略重用方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索