检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Zhengmao ZHU Honglong TIAN Xionghui CHEN Kun ZHANG Yang YU
机构地区:[1]School of Artificial Intelligence,Nanjing University,Nanjing 210023,China [2]Department of Philosophy,Carnegie Mellon University,Pittsburgh PA 15112,USA
出 处:《Frontiers of Computer Science》2025年第4期77-90,共14页计算机科学前沿(英文版)
摘 要:Model-based methods have recently been shown promising for offline reinforcement learning(RL),which aims at learning good policies from historical data without interacting with the environment.Previous model-based offline RL methods employ a straightforward prediction method that maps the states and actions directly to the next-step states.However,such a prediction method tends to capture spurious relations caused by the sampling policy preference behind the offline data.It is sensible that the environment model should focus on causal influences,which can facilitate learning an effective policy that can generalize well to unseen states.In this paper,we first provide theoretical results that causal environment models can outperform plain environment models in offline RL by incorporating the causal structure into the generalization error bound.We also propose a practical algorithm,oFfline mOdel-based reinforcement learning with CaUsal Structured World Models(FOCUS),to illustrate the feasibility of learning and leveraging causal structure in offline RL.Experimental results on two benchmarks show that FOCUS reconstructs the underlying causal structure accurately and robustly,and,as a result,outperforms both model-based offline RL algorithms and causal model-based offline RL algorithms.
关 键 词:reinforcement learning offline reinforcement learning model-based reinforcement learning causal discovery
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49