LiFE:Deep Exploration via Linear-Feature Bonus in Continuous Control  

在线阅读下载全文

作  者:Jiantao Qiu Yu Wang 

机构地区:[1]Department of Electronic Engineering and Beijing National Research Center for Information Science and Technology(BNRist),Tsinghua University,Beijing 100084,China

出  处:《Tsinghua Science and Technology》2023年第1期155-166,共12页清华大学学报(自然科学版(英文版)

摘  要:Reinforcement Learning(RL)algorithms work well with well-defined rewards,but they fail with sparse/deceptive rewards and require additional exploration strategies.This work introduces a deep exploration method based on the Upper Confidence Bound(UCB)bonus.The proposed method can be plugged into actor-critic algorithms that use deep neural networks as a critic.Based on the conclusion of the regret bound under the linear Markov decision process approximation,we use the feature matrix to calculate the UCB bonus for deep exploration.The proposed method is equivalent to the count-based exploration method in special cases and is suitable for general situations.Our method uses the last d-dimensional feature vector in the critic network and is easy to deploy.We design a simple task,“swim”,to demonstrate the principle of the proposed method to achieve exploration in sparse/deceptive reward environments.Then,we perform an empirical evaluation on sparse/deceptive reward version gym environments and Ackermann robot control tasks.The evaluation results verify that the proposed algorithm can perform effective deep explorations in sparse/deceptive reward tasks.

关 键 词:Reinforcement Learning(RL) Neural Network(NN) Upper Confidence Bound(UCB) 

分 类 号:TN9[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象