检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Sheng YUE Yongheng DENG Guanbo WANG Ju REN Yaoxue ZHANG
机构地区:[1]Department of Computer Science and Technology,BNRist,Tsinghua University,Beijing 100084,China [2]Zhongguancun Laboratory,Beijing 100194,China
出 处:《Chinese Journal of Electronics》2024年第6期1360-1372,共13页电子学报(英文版)
基 金:supported by the National Natural Science Foundation of China(Grant Nos.62341201,62122095,62072472,62172445,62302260,and 62202256);the National Key R&D Program of China(Grant No.2022YFF0604502);the China Postdoctoral Science Foundation(Grant No.2023M731956);a grant from the Guoqiang Institute;Tsinghua University。
摘 要:Offline reinforcement learning(RL)has gathered increasing attention in recent years,which seeks to learn policies from static datasets without active online exploration.However,the existing offline RL approaches often require a large amount of pre-collected data and hence are hardly implemented by a single agent in practice.Inspired by the advancement of federated learning(FL),this paper studies federated offline reinforcement learning(FORL),whereby multiple agents collaboratively carry out offline policy learning with no need to share their raw trajectories.Clearly,a straightforward solution is to simply retrofit the off-the-shelf offline RL methods for FL,whereas such an approach easily overfits individual datasets during local updating,leading to instability and subpar performance.To overcome this challenge,we propose a new FORL algorithm,named model-free(MF)-FORL,that exploits novel“proximal local policy evaluation”to judiciously push up action values beyond local data support,enabling agents to capture the individual information without forgetting the aggregated knowledge.Further,we introduce a model-based variant,MB-FORL,capable of improving the generalization ability and computational efficiency via utilizing a learned dynamics model.We evaluate the proposed algorithms on a suite of complex and high-dimensional offline RL benchmarks,and the results demonstrate significant performance gains over the baselines.
关 键 词:Offline reinforcement learning Batch reinforcement learning Federated learning Reinforcement learning
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.70