检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李思聪 王坚[1] 宋亚飞[1] 王硕 冯存前[1] LI Sicong;WANG Jian;SONG Yafei;WANG Shuo;FENG Cunqian(Air Defense and Antimissile School,Air Force Engineering University,Xi’an 710051,China;Unit 95285,GuiLin 541000,Guangxi,China)
机构地区:[1]空军工程大学防空反导学院,西安710051 [2]95285部队,广西桂林541000
出 处:《空军工程大学学报》2025年第1期95-103,共9页Journal of Air Force Engineering University
基 金:国家自然科学基金(61806219,61703426,61876189);陕西省自然科学基础研究计划(2021JM-226);陕西省高校科协青年人才托举计划(20190108,20220106);陕西省创新能力支撑计划(2020KJXX-065)。
摘 要:随着大数据的支撑,深度学习模型在计算机视觉和自然语言处理等领域展现出卓越的能力。然而,在恶意代码图像领域应用中,可能会出现训练数据不足的情况。由于部分恶意家族训练样本数量有限,无法充分描述整个数据集的分布特征,深度学习模型可能会过度拟合于这些稀缺数据,导致模型的性能不佳。针对以上问题,提出一种基于扩散模型生成新样本的数据集扩充方法,通过学习从原始数据到噪声的转换过程,并利用反向过程还原噪声样本为新的相似样本,实现数据集的扩充,生成与原始数据集相似但不同的新样本,以缓解部分家族数据不平衡对分类检测任务的影响,提高模型的泛化能力。With the support of big data in recent years,deep learning models have been demonstrating excellent capabilities in the aspects of computer vision and natural language processing.However,in the application of malicious code images fields,it is entirely possible for the malicious code to be insufficient training data.The distribution of whole dataset with number of training samples in some malicious families being limited is hardly characterized fully,and the deep learning model may be over-fitted to these scarce data,resulting in poor model performance.In view of the above-mentioned problems,this paper proposes a dataset expansion method based on the diffusion model to generate new samples.Such a method is to achieve dataset expansion by learning the conversion process from the original data to noise and using the inverse process to reduce the noise samples into new similar samples,generating new samples similar to the original dataset but different from the original dataset,alleviating the impact of the imbalance of data of some of the families on the classification and detection task,and improving the model’s generalization ability.
关 键 词:恶意代码检测 扩散模型 恶意代码可视化 数据增强技术 U-Net
分 类 号:TP309[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.147.52.13