LiFE:Deep Exploration via Linear-Feature Bonus in Continuous Control

作　　者：Jiantao Qiu Yu Wang

机构地区：[1]Department of Electronic Engineering and Beijing National Research Center for Information Science and Technology(BNRist),Tsinghua University,Beijing 100084,China

出　　处：《Tsinghua Science and Technology》2023年第1期155-166,共12页清华大学学报（自然科学版（英文版）

摘　　要：Reinforcement Learning(RL)algorithms work well with well-defined rewards,but they fail with sparse/deceptive rewards and require additional exploration strategies.This work introduces a deep exploration method based on the Upper Confidence Bound(UCB)bonus.The proposed method can be plugged into actor-critic algorithms that use deep neural networks as a critic.Based on the conclusion of the regret bound under the linear Markov decision process approximation,we use the feature matrix to calculate the UCB bonus for deep exploration.The proposed method is equivalent to the count-based exploration method in special cases and is suitable for general situations.Our method uses the last d-dimensional feature vector in the critic network and is easy to deploy.We design a simple task,“swim”,to demonstrate the principle of the proposed method to achieve exploration in sparse/deceptive reward environments.Then,we perform an empirical evaluation on sparse/deceptive reward version gym environments and Ackermann robot control tasks.The evaluation results verify that the proposed algorithm can perform effective deep explorations in sparse/deceptive reward tasks.

关键词：Reinforcement Learning(RL) Neural Network(NN) Upper Confidence Bound(UCB)

分类号：TN9[电子电信—信息与通信工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

LiFE:Deep Exploration via Linear-Feature Bonus in Continuous Control

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

LiFE:Deep Exploration via Linear-Feature Bonus in Continuous Control

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索