Federated Offline Reinforcement Learning with Proximal Policy Evaluation  

在线阅读下载全文

作  者:Sheng YUE Yongheng DENG Guanbo WANG Ju REN Yaoxue ZHANG 

机构地区:[1]Department of Computer Science and Technology,BNRist,Tsinghua University,Beijing 100084,China [2]Zhongguancun Laboratory,Beijing 100194,China

出  处:《Chinese Journal of Electronics》2024年第6期1360-1372,共13页电子学报(英文版)

基  金:supported by the National Natural Science Foundation of China(Grant Nos.62341201,62122095,62072472,62172445,62302260,and 62202256);the National Key R&D Program of China(Grant No.2022YFF0604502);the China Postdoctoral Science Foundation(Grant No.2023M731956);a grant from the Guoqiang Institute;Tsinghua University。

摘  要:Offline reinforcement learning(RL)has gathered increasing attention in recent years,which seeks to learn policies from static datasets without active online exploration.However,the existing offline RL approaches often require a large amount of pre-collected data and hence are hardly implemented by a single agent in practice.Inspired by the advancement of federated learning(FL),this paper studies federated offline reinforcement learning(FORL),whereby multiple agents collaboratively carry out offline policy learning with no need to share their raw trajectories.Clearly,a straightforward solution is to simply retrofit the off-the-shelf offline RL methods for FL,whereas such an approach easily overfits individual datasets during local updating,leading to instability and subpar performance.To overcome this challenge,we propose a new FORL algorithm,named model-free(MF)-FORL,that exploits novel“proximal local policy evaluation”to judiciously push up action values beyond local data support,enabling agents to capture the individual information without forgetting the aggregated knowledge.Further,we introduce a model-based variant,MB-FORL,capable of improving the generalization ability and computational efficiency via utilizing a learned dynamics model.We evaluate the proposed algorithms on a suite of complex and high-dimensional offline RL benchmarks,and the results demonstrate significant performance gains over the baselines.

关 键 词:Offline reinforcement learning Batch reinforcement learning Federated learning Reinforcement learning 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象