Efficient policy evaluation by matrix sketching  

在线阅读下载全文

作  者:Cheng CHEN Weinan ZHANG Yong YU 

机构地区:[1]Department of Computer Science,Shanghai Jiao Tong University,Shanghai 200240,China

出  处:《Frontiers of Computer Science》2022年第5期97-105,共9页中国计算机科学前沿(英文版)

基  金:The corresponding author Weinan Zhang was supported by the“New Generation of AI 2030”Major Project(2018AAA0100900);the National Natural Science Foundation of China(Grant Nos.62076161,61772333,61632017).

摘  要:In the reinforcement learning,policy evaluation aims to predict long-term values of a state under a certain policy.Since high-dimensional representations become more and more common in the reinforcement learning,how to reduce the computational cost becomes a significant problem to the policy evaluation.Many recent works focus on adopting matrix sketching methods to accelerate least-square temporal difference(TD)algorithms and quasi-Newton temporal difference algorithms.Among these sketching methods,the truncated incremental SVD shows better performance because it is stable and efficient.However,the convergence properties of the incremental SVD is still open.In this paper,we first show that the conventional incremental SVD algorithms could have enormous approximation errors in the worst case.Then we propose a variant of incremental SVD with better theoretical guarantees by shrinking the singular values periodically.Moreover,we employ our improved incremental SVD to accelerate least-square TD and quasi-Newton TD algorithms.The experimental results verify the correctness and effectiveness of our methods.

关 键 词:temporal difference learning policy evaluation matrix sketching 

分 类 号:O17[理学—数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象