基于自生成专家样本的探索增强算法

Enhance exploration with self-generated expert samples

作　　者：刘健赵恒一 LIU Jian;ZHAO Heng-yi(School of Information and Control Engineering,China University of Mining and Technology,Xuzhou Jiangsu 221116,China)

机构地区：[1]中国矿业大学信息与控制工程学院,江苏徐州221116

出　　处：《控制理论与应用》2023年第3期485-492,共8页Control Theory & Applications

基　　金：国家自然科学基金项目(61906198);江苏省自然科学基金项目(BK20190622)资助。

摘　　要：为进一步提高深度强化学习算法在连续动作环境中的探索能力,以获得更高水平的奖励值,本文提出了基于自生成专家样本的探索增强算法.首先,为满足自生成专家样本机制以及在连续动作环境中的学习,在双延迟深度确定性策略梯度算法的基础上,设置了两个经验回放池结构,搭建了确定性策略算法的总体框架.同时提出复合策略更新方法,在情节的内部循环中加入一种类同策略学习过程,智能体在这个过程中完成对于参数空间的启发式探索.然后,提出基于自生成专家样本的演示机制,由智能体自身筛选产生专家样本,并根据参数的更新不断调整,进而形成动态的筛选标准,之后智能体将模仿这些专家样本进行学习.在OpenAI Gym的8组虚拟环境中的仿真实验表明,本文提出的算法能够有效提升深度强化学习的探索能力.In order to further improve the exploration ability of the deep reinforcement learning algorithm in the con-tinuous action environment,so as to obtain a higher level of reward value,an algorithm named enhance exploration with self-generated expert samples is proposed.First of all,to satisfy the self-generated expert samples mechanism and learning in the continuous action environment,on the basis of twin delayed deep deterministic policy gradient algorithm,we set up two experience replay structures and build the overall framework of the deterministic policy algorithm.Meanwhile,a combined policy update method is proposed.The approximate on-policy learning process is added to the internal loop of the episode.The agent completes the heuristic exploration of the parameter space in this process.Secondly,a demonstra-tion mechanism based on the self-generated expert samples is proposed.Expert samples are generated by the agent’s own selection,while the criteria are continuously adjusted according to the update of parameters,which could form dynamic screening criteria.After that,the agent will imitate these expert samples for learning.Simulation experiments in 8 envi-ronments in the OpenAI Gym show that the proposed algorithm can effectively improve the exploration ability of deep reinforcement learning.

关键词：深度强化学习探索专家样本确定性策略

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于自生成专家样本的探索增强算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于自生成专家样本的探索增强算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索