SPaRM: an efficient exploration and planning framework for sparse reward reinforcement learning  

在线阅读下载全文

作  者:BAN Jian LI Gongyan XU Shaoyun 班健(Institute of Microelectronics,Chinese Academy of Sciences,Beijing 100029,P.R.China;University of Chinese Academy of Sciences,Beijing 100049,P.R.China)

机构地区:[1]Institute of Microelectronics,Chinese Academy of Sciences,Beijing 100029,P.R.China [2]University of Chinese Academy of Sciences,Beijing 100049,P.R.China

出  处:《High Technology Letters》2024年第4期344-355,共12页高技术通讯(英文版)

基  金:Supported by the International Partnership Program of Chinese Academy of Sciences(No.184131KYSB20200033).

摘  要:Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challenge posed by sparse rewards,the planning phase of reinforcement learning consumes a considerable amount of time on repetitive and unproductive tasks before adequately ac-cessing sparse reward signals.To address these challenges,this work proposes a space partitioning and reverse merging(SPaRM)framework based on reward-free exploration(RFE).The framework consists of two parts:the space partitioning module and the reverse merging module.The former module partitions the entire state space into a specific number of subspaces to expedite the explora-tion phase.This work establishes its theoretical sample complexity lower bound.The latter module starts planning in reverse from near the target and gradually extends to the starting state,as opposed to the conventional practice of starting at the beginning.This facilitates the early involvement of sparse rewards at the target in the policy update process.This work designs two experimental envi-ronments:a complex maze and a set of randomly generated maps.Compared with two state-of-the-art(SOTA)algorithms,experimental results validate the effectiveness and superior performance of the proposed algorithm.

关 键 词:reinforcement learning(RL) sparse reward reward-free exploration(RFE) space partitioning(SP) reverse merging(RM) 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象