检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李超 门昌骞[1] 王文剑[2] LI Chao;MEN Changqian;WANG Wenjian(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University),Ministry of Education,Taiyuan 030006,China)
机构地区:[1]山西大学计算机与信息技术学院,太原030006 [2]计算智能与中文信息处理教育部重点实验室(山西大学),太原030006
出 处:《计算机科学与探索》2020年第3期513-526,共14页Journal of Frontiers of Computer Science and Technology
基 金:国家自然科学基金 Nos.61673249,U1805263;山西省国际科技合作重点研发计划项目 No.201903D421050~~
摘 要:探索与利用的均衡是强化学习研究的重点之一。探索帮助智能体进一步了解环境来做出更优决策;而利用帮助智能体根据其自身当前对于环境的认知来做出当前最优决策。目前大多数探索算法只与值函数相关联,不考虑当前智能体对于环境的认知程度,探索效率极低。针对此问题,提出了一种基于状态空间自适应离散化的RMAX-KNN强化学习算法,算法根据当前智能体对于环境状态空间的离散化程度改写值函数形式,然后基于此值函数对环境进行合理的探索,逐步实现对于环境状态空间的自适应离散化划分。RMAXKNN算法通过将探索与环境状态空间离散化相结合,逐渐加深智能体对于环境的认知程度,进而提高探索效率,同时在理论上证明该算法是一种概率近似正确(PAC)最优探索算法。在Benchmark环境上的仿真实验结果表明,RMAX-KNN算法可以在探索环境的同时实现对于环境状态空间的自适应离散化,并学习到最优策略。The balance of exploration and exploitation is one of the focuses of reinforcement learning research.The exploration helps the agent understand the environment more comprehensively and make better decisions while the exploitation helps the agent make current optimal decisions based on its current cognition of the environment.At present,most of the exploration algorithms are only associated with the value function,regardless of the agent’s current cognitive level of the environment,so the efficiency of the exploration is extremely low.Aiming at solving this problem,this paper proposes a reinforcement learning algorithm named RMAX-KNN(reward maximum K-nearest neighbor)based on the adaptive discretization of the state space.The algorithm rewrites the value function according to the level of discretization of the state space and makes the agent explore the environment reasonably,gradually achieving the adaptive discretization of the environmental state space.By combining the exploration with the discretization of the environmental state space,the RMAX-KNN algorithm gradually raises the cognitive level of the agent in terms of the environment and increases the efficiency of exploration.At the same time,this algorithm is proven to be a probably approximately correct(PAC)optimal exploration algorithm theoretically.The simulation experiments in the Benchmark domains show that the RMAX-KNN algorithm can achieve the adaptive discretization of the environmental state space while exploring the environment and developing the optimal strategy.
关 键 词:探索与利用的均衡 值函数 状态空间自适应离散化 概率近似正确(PAC)最优探索算法
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30