基于扩散模型的恶意代码数据集扩充方法

A Diffusion Model Approach to Malicious Code Dataset Expansion

作　　者：李思聪王坚[1] 宋亚飞[1] 王硕冯存前[1] LI Sicong;WANG Jian;SONG Yafei;WANG Shuo;FENG Cunqian(Air Defense and Antimissile School,Air Force Engineering University,Xi’an 710051,China;Unit 95285,GuiLin 541000,Guangxi,China)

机构地区：[1]空军工程大学防空反导学院,西安710051 [2]95285部队,广西桂林541000

出　　处：《空军工程大学学报》2025年第1期95-103,共9页Journal of Air Force Engineering University

基　　金：国家自然科学基金(61806219,61703426,61876189);陕西省自然科学基础研究计划(2021JM-226);陕西省高校科协青年人才托举计划(20190108,20220106);陕西省创新能力支撑计划(2020KJXX-065)。

摘　　要：随着大数据的支撑,深度学习模型在计算机视觉和自然语言处理等领域展现出卓越的能力。然而,在恶意代码图像领域应用中,可能会出现训练数据不足的情况。由于部分恶意家族训练样本数量有限,无法充分描述整个数据集的分布特征,深度学习模型可能会过度拟合于这些稀缺数据,导致模型的性能不佳。针对以上问题,提出一种基于扩散模型生成新样本的数据集扩充方法,通过学习从原始数据到噪声的转换过程,并利用反向过程还原噪声样本为新的相似样本,实现数据集的扩充,生成与原始数据集相似但不同的新样本,以缓解部分家族数据不平衡对分类检测任务的影响,提高模型的泛化能力。With the support of big data in recent years,deep learning models have been demonstrating excellent capabilities in the aspects of computer vision and natural language processing.However,in the application of malicious code images fields,it is entirely possible for the malicious code to be insufficient training data.The distribution of whole dataset with number of training samples in some malicious families being limited is hardly characterized fully,and the deep learning model may be over-fitted to these scarce data,resulting in poor model performance.In view of the above-mentioned problems,this paper proposes a dataset expansion method based on the diffusion model to generate new samples.Such a method is to achieve dataset expansion by learning the conversion process from the original data to noise and using the inverse process to reduce the noise samples into new similar samples,generating new samples similar to the original dataset but different from the original dataset,alleviating the impact of the imbalance of data of some of the families on the classification and detection task,and improving the model’s generalization ability.

关键词：恶意代码检测扩散模型恶意代码可视化数据增强技术 U-Net

分类号：TP309[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于扩散模型的恶意代码数据集扩充方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于扩散模型的恶意代码数据集扩充方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索