基于多特征融合的恶意软件分类方案

Malware classification scheme based on multi feature fusion

作　　者：张冬雯[1] 张少华陈振国[3] 张光华[1,2] 于乃文 ZHANG Dongwen;ZHANG Shaohua;CHEN Zhenguo;ZHANG Guanghua;YU Naiwen(School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang 050018,China;State Key Laboratory of Integrated Services Networks,Xidian University,Xi'an 710071,China;Hebei IoT Monitoring Engineering Technology Research Center,North China Institute of Science and Technology,Langfang 065201,China)

机构地区：[1]河北科技大学信息科学与工程学院,河北石家庄050018 [2]西安电子科技大学综合业务网理论及关键技术国家重点实验室,陕西西安710071 [3]华北科技学院河北省物联网监控工程技术研究中心,河北廊坊065201

出　　处：《微电子学与计算机》2022年第5期87-95,共9页Microelectronics & Computer

基　　金：国家重点研发计划项目(2018YFB0804701);国家自然科学基金项目(62072239);河北省科技厅科技计划项目(20377725D)。

摘　　要：传统的恶意软件分类特征提取常以单一特征作为检测分类标准,存在检测准确率低、效果差问题,为此提出了一种提取多重静态特征进行融合并利用集成学习算法进行恶意软件家族分类方案.首先,在Kaggle数据集上对反编译恶意样本提取字节码、操作码、API序列和灰度图四种不同角度的静态特征;然后,利用卡方检验和皮尔逊相关系数进行重要特征选择,筛选出与类标签相关性强的特征;最后,将筛选出的重要特征输入到GBDT算法、XGBoost算法和随机森林算法等集成学习模型中进行恶意软件家族分类.实验结果表明,与传统的恶意软件分类方案相比,基于多特征融合的集成学习恶意软件分类方案准确率达到99.8%.相较传统单一特征机器学习分类方案能有效的提高对未知或变体恶意软件检测和分类的准确率.The traditional feature extraction of malware classification usually takes single feature as the detection classification standard,which has the problem of low detection accuracy and poor effect.A scheme of extracting multiple static features for fusion and using integrated learning algorithm for malware family classification is proposed.Firstly,the static features of byte code,operation code,API sequence and gray image are extracted from the decompiled malicious samples on the Kaggle data set.Then,the chi square test and the Pearson correlation coefficient are used to select important features,and the features with strong correlation with class labels are selected.Finally,the selected important features are input into the integrated learning models such as GBDT algorithm,XGBoost algorithm and random forest algorithm for malware family classification.Experimental results show that compared with the traditional malware classification scheme,the accuracy of the integrated learning malware classification scheme based on multi-feature fusion is 99.8%.Compared with the traditional single feature machine learning classification scheme,it can effectively improve the detection and classification accuracy of unknown or variant malware.

关键词：数据安全与计算机安全恶意软件分类静态分析多特征融合集成学习

分类号：TP309.5[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多特征融合的恶意软件分类方案

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多特征融合的恶意软件分类方案

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索