Understanding adversarial attacks on observations in deep reinforcement learning  被引量:1

在线阅读下载全文

作  者:You QIAOBEN Chengyang YING Xinning ZHOU Hang SU Jun ZHU Bo ZHANG 

机构地区:[1]Department of Computer Science and Technology,Beijing National Research Center for Information Science and Technology,Tsinghua-Bosch Joint Center for Machine Learning,Institute for Artificial Intelligence,Tsinghua University,Beijing 100084,China [2]Peng Cheng Laboratory,Shenzhen 518055,China

出  处:《Science China(Information Sciences)》2024年第5期65-79,共15页中国科学(信息科学)(英文版)

基  金:supported by National Key Research and Development Program of China (Grant Nos. 2020AAA0104304, 2017YFA0700904);National Natural Science Foundation of China (Grant Nos. 61620106010, 62061136001, 61621136008, 62076147, U19B2034, U1811461, U19A2081);Beijing NSF Project (Grant No. JQ19016);Beijing Academy of Artificial Intelligence (BAAI);Tsinghua-Huawei Joint Research Program, Tsinghua Institute for Guo Qiang;Tsinghua-OPPO Joint Research Center for Future Terminal Technology;Tsinghua-China Mobile Communications Group Co., Ltd. Joint Institute

摘  要:Deep reinforcement learning models are vulnerable to adversarial attacks that can decrease the cumulative expected reward of a victim by manipulating its observations.Despite the efficiency of previous optimization-based methods for generating adversarial noise in supervised learning,such methods might not achieve the lowest cumulative reward since they do not generally explore the environmental dynamics.Herein,a framework is provided to better understand the existing methods by reformulating the problem of adversarial attacks on reinforcement learning in the function space.The reformulation approach adopted herein generates an optimal adversary in the function space of targeted attacks,repelling them via a generic two-stage framework.In the first stage,a deceptive policy is trained by hacking the environment and discovering a set of trajectories routing to the lowest reward or the worst-case performance.Next,the adversary misleads the victim to imitate the deceptive policy by perturbing the observations.Compared to existing approaches,it is theoretically shown that our adversary is strong under an appropriate noise level.Extensive experiments demonstrate the superiority of the proposed method in terms of efficiency and effectiveness,achieving state-of-the-art performance in both Atari and MuJoCo environments.

关 键 词:deep learning reinforcement learning adversarial robustness adversarial attack 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TP309[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象