PAC最优的RMAX-KNN探索算法  被引量:2

PAC Optimal Exploration Algorithm Named RMAX-KNN

在线阅读下载全文

作  者:李超 门昌骞[1] 王文剑[2] LI Chao;MEN Changqian;WANG Wenjian(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University),Ministry of Education,Taiyuan 030006,China)

机构地区:[1]山西大学计算机与信息技术学院,太原030006 [2]计算智能与中文信息处理教育部重点实验室(山西大学),太原030006

出  处:《计算机科学与探索》2020年第3期513-526,共14页Journal of Frontiers of Computer Science and Technology

基  金:国家自然科学基金 Nos.61673249,U1805263;山西省国际科技合作重点研发计划项目 No.201903D421050~~

摘  要:探索与利用的均衡是强化学习研究的重点之一。探索帮助智能体进一步了解环境来做出更优决策;而利用帮助智能体根据其自身当前对于环境的认知来做出当前最优决策。目前大多数探索算法只与值函数相关联,不考虑当前智能体对于环境的认知程度,探索效率极低。针对此问题,提出了一种基于状态空间自适应离散化的RMAX-KNN强化学习算法,算法根据当前智能体对于环境状态空间的离散化程度改写值函数形式,然后基于此值函数对环境进行合理的探索,逐步实现对于环境状态空间的自适应离散化划分。RMAXKNN算法通过将探索与环境状态空间离散化相结合,逐渐加深智能体对于环境的认知程度,进而提高探索效率,同时在理论上证明该算法是一种概率近似正确(PAC)最优探索算法。在Benchmark环境上的仿真实验结果表明,RMAX-KNN算法可以在探索环境的同时实现对于环境状态空间的自适应离散化,并学习到最优策略。The balance of exploration and exploitation is one of the focuses of reinforcement learning research.The exploration helps the agent understand the environment more comprehensively and make better decisions while the exploitation helps the agent make current optimal decisions based on its current cognition of the environment.At present,most of the exploration algorithms are only associated with the value function,regardless of the agent’s current cognitive level of the environment,so the efficiency of the exploration is extremely low.Aiming at solving this problem,this paper proposes a reinforcement learning algorithm named RMAX-KNN(reward maximum K-nearest neighbor)based on the adaptive discretization of the state space.The algorithm rewrites the value function according to the level of discretization of the state space and makes the agent explore the environment reasonably,gradually achieving the adaptive discretization of the environmental state space.By combining the exploration with the discretization of the environmental state space,the RMAX-KNN algorithm gradually raises the cognitive level of the agent in terms of the environment and increases the efficiency of exploration.At the same time,this algorithm is proven to be a probably approximately correct(PAC)optimal exploration algorithm theoretically.The simulation experiments in the Benchmark domains show that the RMAX-KNN algorithm can achieve the adaptive discretization of the environmental state space while exploring the environment and developing the optimal strategy.

关 键 词:探索与利用的均衡 值函数 状态空间自适应离散化 概率近似正确(PAC)最优探索算法 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象