检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Tong Chen Ji-Qiang Liu He Li Shuo-Ru Wang Wen-Jia Niu En-Dong Tong Liang Chang Qi Alfred Chen Gang Li
机构地区:[1]Beijing Key Laboratory of Security and Privacy in Intelligent Transportation,Beijing Jiaotong University Beijing 100044,China [2]Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China [3]Donald Bren School of Information and Computer Sciences,University of California,Irvine 92697,U.S.A. [4]Centre for Cyber Security Research and Innovation,Deakin University,Geelong,VIC 3216,Australia
出 处:《Journal of Computer Science & Technology》2021年第5期1002-1021,共20页计算机科学技术学报(英文版)
基 金:supported by the National Natural Science Foundation of China under Grant Nos.61972025,61802389,61672092,U1811264,and 61966009;the National Key Research and Development Program of China under Grant Nos.2020YFB1005604 and 2020YFB2103802;Guangxi Key Laboratory of Trusted Software under Grant No.KX201902.
摘 要:Reinforcement learning as autonomous learning is greatly driving artificial intelligence(AI)development to practical applications.Having demonstrated the potential to significantly improve synchronously parallel learning,the parallel computing based asynchronous advantage actor-critic(A3C)opens a new door for reinforcement learning.Unfortunately,the acceleration's influence on A3C robustness has been largely overlooked.In this paper,we perform the first robustness assessment of A3C based on parallel computing.By perceiving the policy's action,we construct a global matrix of action probability deviation and define two novel measures of skewness and sparseness to form an integral robustness measure.Based on such static assessment,we then develop a dynamic robustness assessing algorithm through situational whole-space state sampling of changing episodes.Extensive experiments with different combinations of agent number and learning rate are implemented on an A3C-based pathfinding application,demonstrating that our proposed robustness assessment can effectively measure the robustness of A3C,which can achieve an accuracy of 83.3%.
关 键 词:robustness assessment SKEWNESS SPARSENESS asynchronous advantage actor-critic reinforcement learning
分 类 号:TP39[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.19.67.85