检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:于丰瑞 杜彦辉[1] YU Fengrui;DU Yanhui(School of Information and Cyber Security,People’s Public Security University of China,Beijing 100038,China;Inner Mongolia Police Professional College,Hohhot 010051,China)
机构地区:[1]中国人民公安大学信息网络安全学院,北京100038 [2]内蒙古警察职业学院,呼和浩特010051
出 处:《计算机科学与探索》2025年第1期118-131,共14页Journal of Frontiers of Computer Science and Technology
基 金:中国人民公安大学网络空间安全执法技术双一流创新研究专项(2023SYL07);内蒙古警察职业学院重点科研项目(NMJY2022-LX-ZD007)。
摘 要:MITREATT&CK定义了网络攻击全过程14类战术625类技术,逐步成为网络威胁技战术情报(TTP)的事实标准,现有研究基于此分类将TTP识别提取问题转化为句子级别的战、技术类别多分类任务,利用深度学习、基于提示工程的大语言模型进行问题研究。但限于数据集小样本类别占比大、多分类模型性能瓶颈问题,类别识别覆盖率与精度较低。提出结合ChatGPT数据增强和指令监督微调大语言模型的方法,较好地解决了句子级别技术类别多分类问题。ChatGPT数据增强方法在保留原始样本语义基础上更好地丰富了样本多样性,为小样本学习高性能识别提供了高质量训练数据支撑,实验结果也证明了本数据增强方法的优越性;指令监督微调大语言模型,突破了深度学习多分类模型的性能瓶颈,实现625类技术类别识别全覆盖,Precision、Recall和F1值分别达到了86.2%、89.9%和88.0%,优于已有研究。The MITRE ATT&CK framework defines 14 tactics and 625 techniques that cover the full spectrum of cyber attacks.It has progressively become the de facto standard for describing tactics,techniques,and procedures(TTPs)in cyber threat intelligence.Current research often transforms the task of identifying and extracting TTPs into a multi-class classification problem at the sentence level,employing deep learning and large language models based on prompt engineering.However,issues such as the dominance of small sample categories in datasets and the performance limitations of multiclass models result in low coverage and accuracy in category identification.This paper proposes a method that combines ChatGPT data augmentation with instruction-supervised fine-tuning of large language models,effectively addressing the multi-class classification problem for technique categories at the sentence level.The ChatGPT data augmentation method enriches sample diversity while preserving the original sample semantics,providing high-quality training data to support high-performance recognition in small sample learning.Experimental results demonstrate the superiority of this data augmentation method.The instruction-supervised fine-tuning of the large language model overcomes the performance bottleneck of deep learning multi-class models,achieving full coverage of 625 technique categories.The Precision,Recall,and F1-score reach 86.2%,89.9%and 88.0%,respectively,surpassing existing research.
关 键 词:网络威胁情报(CTI) 网络威胁技战术情报(TTP) ATT&CK 数据增强 大语言模型 监督微调(SFT)
分 类 号:TP393.08[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.70