基于强化学习的自动化红队测试计划构建与验证  被引量:1

Construction and Verification of Automated Red Teaming Testing Plan Based on Reinforcement Learning

在线阅读下载全文

作  者:王震 李赛飞[1] 张丽杰 WANG Zhen;LI Saifei;ZHANG Lijie(School of Information Science&Technology,Southwest Jiaotong University,Chengdu Sichuan 611756,China;Information Technology Center,Norla Institute of Technical Physics,Chengdu Sichuan 610041,China)

机构地区:[1]西南交通大学信息科学与技术学院,四川成都611756 [2]北方激光研究院有限公司信息技术中心,四川成都610041

出  处:《信息安全与通信保密》2022年第8期71-82,共12页Information Security and Communications Privacy

基  金:四川省科技计划项目(No.2021YJ0372);四川省重大科技专项项目(No.2019ZDZX0007,No.2021YFQ0056)。

摘  要:自动化红队测试是当前研究的热点问题,旨在更加高效、低成本和可重复地进行网络安全评估。自动攻击计划生成是自动化红队测试的重要部分,目的是替代安全专家进行攻击计划过程。将强化学习与红队测试问题相结合,将红队测试过程建模为马尔可夫决策模型,利用基于策略(Policy Gradient)和基于价值(Q-Learning、SARSA和Deep Q Network)的强化学习算法,在仿真环境中训练代理完成攻击计划的构建;在实验环境中验证攻击计划的可行性和适应性。仿真和实验结果表明,PG算法只学习到非最优攻击计划,收敛速度慢;Q-Learning、SARSA和DQN算法能学习到最优攻击计划,Q-Learning算法收敛速度最快,SARSA算法次之,DQN算法最慢;利用强化学习算法构建的攻击计划具有较好的可行性和适应性。Automated red teaming testing is a hot issue of current research aimed at more efficient,cost-effective and repeatable cybersecurity assessments.The construction of automated attack plans is an important part of automated red teaming testing,which is designed to replace the attack decision-making process by security experts.In this paper,reinforcement learning is combined with red teaming testing,and the red teaming testing process is modeled as a Markov decision process model,the agent is trained in simulated environment by policy-based and value-based reinforcement learning algorithms;and the feasibility and adaptability of the attack plan are verified in the experimental environment.The simulation and experimental results indicate that the PG algorithm can only learn the non-optimal attack plan,and the convergence speed is slow;the Q-Learning,SARSA and DQN algorithms can learn the optimal attack plan,the Q-Learning algorithm has the fastest convergence speed,followed by the SARSA algorithm,and the DQN algorithm is the slowest;the attack plan constructed by the reinforcement learning algorithm is feasible and adaptive.

关 键 词:网络安全 红队 渗透测试 自动化计划 强化学习 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象