网络威胁技战术情报识别提取生成式技术研究

Research on Generative Techniques for Identifying and Extracting Tactics,Techniques and Procedures

作　　者：于丰瑞杜彦辉[1] YU Fengrui;DU Yanhui(School of Information and Cyber Security,People’s Public Security University of China,Beijing 100038,China;Inner Mongolia Police Professional College,Hohhot 010051,China)

机构地区：[1]中国人民公安大学信息网络安全学院,北京100038 [2]内蒙古警察职业学院,呼和浩特010051

出　　处：《计算机科学与探索》2025年第1期118-131,共14页Journal of Frontiers of Computer Science and Technology

基　　金：中国人民公安大学网络空间安全执法技术双一流创新研究专项(2023SYL07);内蒙古警察职业学院重点科研项目(NMJY2022-LX-ZD007)。

摘　　要：MITREATT&CK定义了网络攻击全过程14类战术625类技术,逐步成为网络威胁技战术情报(TTP)的事实标准,现有研究基于此分类将TTP识别提取问题转化为句子级别的战、技术类别多分类任务,利用深度学习、基于提示工程的大语言模型进行问题研究。但限于数据集小样本类别占比大、多分类模型性能瓶颈问题,类别识别覆盖率与精度较低。提出结合ChatGPT数据增强和指令监督微调大语言模型的方法,较好地解决了句子级别技术类别多分类问题。ChatGPT数据增强方法在保留原始样本语义基础上更好地丰富了样本多样性,为小样本学习高性能识别提供了高质量训练数据支撑,实验结果也证明了本数据增强方法的优越性;指令监督微调大语言模型,突破了深度学习多分类模型的性能瓶颈,实现625类技术类别识别全覆盖,Precision、Recall和F1值分别达到了86.2%、89.9%和88.0%,优于已有研究。The MITRE ATT&CK framework defines 14 tactics and 625 techniques that cover the full spectrum of cyber attacks.It has progressively become the de facto standard for describing tactics,techniques,and procedures(TTPs)in cyber threat intelligence.Current research often transforms the task of identifying and extracting TTPs into a multi-class classification problem at the sentence level,employing deep learning and large language models based on prompt engineering.However,issues such as the dominance of small sample categories in datasets and the performance limitations of multiclass models result in low coverage and accuracy in category identification.This paper proposes a method that combines ChatGPT data augmentation with instruction-supervised fine-tuning of large language models,effectively addressing the multi-class classification problem for technique categories at the sentence level.The ChatGPT data augmentation method enriches sample diversity while preserving the original sample semantics,providing high-quality training data to support high-performance recognition in small sample learning.Experimental results demonstrate the superiority of this data augmentation method.The instruction-supervised fine-tuning of the large language model overcomes the performance bottleneck of deep learning multi-class models,achieving full coverage of 625 technique categories.The Precision,Recall,and F1-score reach 86.2%,89.9%and 88.0%,respectively,surpassing existing research.

关键词：网络威胁情报(CTI) 网络威胁技战术情报(TTP) ATT&CK 数据增强大语言模型监督微调(SFT)

分类号：TP393.08[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

网络威胁技战术情报识别提取生成式技术研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

网络威胁技战术情报识别提取生成式技术研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索