检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘健 赵恒一 LIU Jian;ZHAO Heng-yi(School of Information and Control Engineering,China University of Mining and Technology,Xuzhou Jiangsu 221116,China)
机构地区:[1]中国矿业大学信息与控制工程学院,江苏徐州221116
出 处:《控制理论与应用》2023年第3期485-492,共8页Control Theory & Applications
基 金:国家自然科学基金项目(61906198);江苏省自然科学基金项目(BK20190622)资助。
摘 要:为进一步提高深度强化学习算法在连续动作环境中的探索能力,以获得更高水平的奖励值,本文提出了基于自生成专家样本的探索增强算法.首先,为满足自生成专家样本机制以及在连续动作环境中的学习,在双延迟深度确定性策略梯度算法的基础上,设置了两个经验回放池结构,搭建了确定性策略算法的总体框架.同时提出复合策略更新方法,在情节的内部循环中加入一种类同策略学习过程,智能体在这个过程中完成对于参数空间的启发式探索.然后,提出基于自生成专家样本的演示机制,由智能体自身筛选产生专家样本,并根据参数的更新不断调整,进而形成动态的筛选标准,之后智能体将模仿这些专家样本进行学习.在OpenAI Gym的8组虚拟环境中的仿真实验表明,本文提出的算法能够有效提升深度强化学习的探索能力.In order to further improve the exploration ability of the deep reinforcement learning algorithm in the con-tinuous action environment,so as to obtain a higher level of reward value,an algorithm named enhance exploration with self-generated expert samples is proposed.First of all,to satisfy the self-generated expert samples mechanism and learning in the continuous action environment,on the basis of twin delayed deep deterministic policy gradient algorithm,we set up two experience replay structures and build the overall framework of the deterministic policy algorithm.Meanwhile,a combined policy update method is proposed.The approximate on-policy learning process is added to the internal loop of the episode.The agent completes the heuristic exploration of the parameter space in this process.Secondly,a demonstra-tion mechanism based on the self-generated expert samples is proposed.Expert samples are generated by the agent’s own selection,while the criteria are continuously adjusted according to the update of parameters,which could form dynamic screening criteria.After that,the agent will imitate these expert samples for learning.Simulation experiments in 8 envi-ronments in the OpenAI Gym show that the proposed algorithm can effectively improve the exploration ability of deep reinforcement learning.
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.15.204.106