机构地区:[1]苏州大学计算机科学与技术学院,江苏苏州215006 [2]软件新技术与产业化协同创新中心,南京210000 [3]吉林大学符号计算与知识工程教育部重点实验室,长春130012 [4]苏州大学江苏省计算机信息处理技术重点实验室,江苏苏州215006 [5]常熟理工学院计算机科学与工程学院,江苏常熟215500
出 处:《计算机学报》2019年第8期1812-1826,共15页Chinese Journal of Computers
基 金:国家自然科学基金项目(61303108,61373094,61772355);江苏省高校自然科学研究项目重大项目(17KJA520004);符号计算与知识工程教育部重点实验室(吉林大学)项目(93K172014K04);苏州市重点产业技术创新-前瞻性应用研究项目(SYG201804);高校省级重点实验室(苏州大学)项目(KJS1524);中国国家留学基金(201606920013)资助~~
摘 要:深度强化学习利用深度学习感知环境信息,使用强化学习求解最优决策,是当前人工智能领域的主要研究热点之一.然而,大部分深度强化学习的工作未考虑安全问题,有些方法甚至特意加入带随机性质的探索来扩展采样的覆盖面,以期望获得更好的近似最优解.可是,不受安全控制的探索性学习很可能会带来重大风险.针对上述问题,提出了一种基于双深度网络的安全深度强化学习(Dual Deep Network Based Secure Deep Reinforcement Learning,DDN-SDRL)方法.DDN-SDRL方法设计了危险样本经验池和安全样本经验池,其中危险样本经验池用于记录探索失败时的临界状态和危险状态的样本,而安全样本经验池用于记录剔除了临界状态和危险状态的样本.DDN-SDRL方法在原始网络模型上增加了一个深度Q网络来训练危险样本,将高维输入编码为抽象表示后再解码为特征;同时提出了惩罚项描述临界状态,并使用原始网络目标函数和惩罚项计算目标函数.DDN-SDRL方法以危险样本经验池中的样本为输入,使用深度Q网络训练得到惩罚项.由于DDN-SDRL方法利用了临界状态、危险状态及安全状态信息,因此Agent可以通过避开危险状态的样本、优先选取安全状态的样本来提高安全性.DDN-SDRL方法具有通用性,能与多种深度网络模型结合.实验验证了方法的有效性.Reinforcement learning is a widely studied class of machine learning method,where the agent of reinforcement learning keeps continuously interacting with the environment with the goal of getting maximal long term return.Reinforcement learning is particularly prominent in areas such as control and optimal scheduling.Deep reinforcement learning,which is able to take large-scale high-dimensional data,e.g.video and image,as original input data,takes advantage of deep learning methods to extract abstract representations of them,and then utilizes reinforcement learning methods to attain optimal strategies,has recently become a research hotspot in artificial intelligence.There has emerged a large amount of work on deep reinforcement learning.For example,deep Q network(DQN),one of the most famous models in deep reinforcement learning,is based on convolutional neural networks(CNNs)and Q-learning algorithm,directly uses the unprocessed image as the input.DQN has been applied to learn strategy in complex environments with high-dimensional input.However,few deep reinforcement learning algorithms considers how to ensure security during the process of learning in the unknown environment.Even more,many reinforcement learning algorithms intentionally add random exploration approaches,e.g.ε-greedy,to guarantee the diversity of data sampling so that the algorithm could obtain a better approximate optimal solution.Nevertheless,exploration without any security constraint is very dangerous and likely to bring with high risk of leading to disastrous results.Aiming at solving this problem,an algorithm,named dual deep network based secure deep reinforcement learning(DDN-SDRL),is proposed.The DDN-SDRL algorithm sets up two experience pools.The first one is the experience pool of dangerous samples,including critical states and dangerous states that caused failure;and the second one is the experience pool of the secure sample,which excluded critical states and dangerous states.The DDN-SDRL algorithm takes advantage of an additional deep Q
关 键 词:强化学习 深度强化学习 深度Q网络 安全深度强化学习 安全人工智能 经验回放
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...