基于强化学习的自动化红队测试计划构建与验证被引量：1

Construction and Verification of Automated Red Teaming Testing Plan Based on Reinforcement Learning

作　　者：王震李赛飞[1] 张丽杰 WANG Zhen;LI Saifei;ZHANG Lijie(School of Information Science&Technology,Southwest Jiaotong University,Chengdu Sichuan 611756,China;Information Technology Center,Norla Institute of Technical Physics,Chengdu Sichuan 610041,China)

机构地区：[1]西南交通大学信息科学与技术学院,四川成都611756 [2]北方激光研究院有限公司信息技术中心,四川成都610041

出　　处：《信息安全与通信保密》2022年第8期71-82,共12页Information Security and Communications Privacy

基　　金：四川省科技计划项目(No.2021YJ0372);四川省重大科技专项项目(No.2019ZDZX0007,No.2021YFQ0056)。

摘　　要：自动化红队测试是当前研究的热点问题,旨在更加高效、低成本和可重复地进行网络安全评估。自动攻击计划生成是自动化红队测试的重要部分,目的是替代安全专家进行攻击计划过程。将强化学习与红队测试问题相结合,将红队测试过程建模为马尔可夫决策模型,利用基于策略(Policy Gradient)和基于价值(Q-Learning、SARSA和Deep Q Network)的强化学习算法,在仿真环境中训练代理完成攻击计划的构建;在实验环境中验证攻击计划的可行性和适应性。仿真和实验结果表明,PG算法只学习到非最优攻击计划,收敛速度慢;Q-Learning、SARSA和DQN算法能学习到最优攻击计划,Q-Learning算法收敛速度最快,SARSA算法次之,DQN算法最慢;利用强化学习算法构建的攻击计划具有较好的可行性和适应性。Automated red teaming testing is a hot issue of current research aimed at more efficient,cost-effective and repeatable cybersecurity assessments.The construction of automated attack plans is an important part of automated red teaming testing,which is designed to replace the attack decision-making process by security experts.In this paper,reinforcement learning is combined with red teaming testing,and the red teaming testing process is modeled as a Markov decision process model,the agent is trained in simulated environment by policy-based and value-based reinforcement learning algorithms;and the feasibility and adaptability of the attack plan are verified in the experimental environment.The simulation and experimental results indicate that the PG algorithm can only learn the non-optimal attack plan,and the convergence speed is slow;the Q-Learning,SARSA and DQN algorithms can learn the optimal attack plan,the Q-Learning algorithm has the fastest convergence speed,followed by the SARSA algorithm,and the DQN algorithm is the slowest;the attack plan constructed by the reinforcement learning algorithm is feasible and adaptive.

关键词：网络安全红队渗透测试自动化计划强化学习

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的自动化红队测试计划构建与验证被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的自动化红队测试计划构建与验证 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于强化学习的自动化红队测试计划构建与验证被引量：1