检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张冬雯[1] 张少华 陈振国[3] 张光华[1,2] 于乃文 ZHANG Dongwen;ZHANG Shaohua;CHEN Zhenguo;ZHANG Guanghua;YU Naiwen(School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang 050018,China;State Key Laboratory of Integrated Services Networks,Xidian University,Xi'an 710071,China;Hebei IoT Monitoring Engineering Technology Research Center,North China Institute of Science and Technology,Langfang 065201,China)
机构地区:[1]河北科技大学信息科学与工程学院,河北石家庄050018 [2]西安电子科技大学综合业务网理论及关键技术国家重点实验室,陕西西安710071 [3]华北科技学院河北省物联网监控工程技术研究中心,河北廊坊065201
出 处:《微电子学与计算机》2022年第5期87-95,共9页Microelectronics & Computer
基 金:国家重点研发计划项目(2018YFB0804701);国家自然科学基金项目(62072239);河北省科技厅科技计划项目(20377725D)。
摘 要:传统的恶意软件分类特征提取常以单一特征作为检测分类标准,存在检测准确率低、效果差问题,为此提出了一种提取多重静态特征进行融合并利用集成学习算法进行恶意软件家族分类方案.首先,在Kaggle数据集上对反编译恶意样本提取字节码、操作码、API序列和灰度图四种不同角度的静态特征;然后,利用卡方检验和皮尔逊相关系数进行重要特征选择,筛选出与类标签相关性强的特征;最后,将筛选出的重要特征输入到GBDT算法、XGBoost算法和随机森林算法等集成学习模型中进行恶意软件家族分类.实验结果表明,与传统的恶意软件分类方案相比,基于多特征融合的集成学习恶意软件分类方案准确率达到99.8%.相较传统单一特征机器学习分类方案能有效的提高对未知或变体恶意软件检测和分类的准确率.The traditional feature extraction of malware classification usually takes single feature as the detection classification standard,which has the problem of low detection accuracy and poor effect.A scheme of extracting multiple static features for fusion and using integrated learning algorithm for malware family classification is proposed.Firstly,the static features of byte code,operation code,API sequence and gray image are extracted from the decompiled malicious samples on the Kaggle data set.Then,the chi square test and the Pearson correlation coefficient are used to select important features,and the features with strong correlation with class labels are selected.Finally,the selected important features are input into the integrated learning models such as GBDT algorithm,XGBoost algorithm and random forest algorithm for malware family classification.Experimental results show that compared with the traditional malware classification scheme,the accuracy of the integrated learning malware classification scheme based on multi-feature fusion is 99.8%.Compared with the traditional single feature machine learning classification scheme,it can effectively improve the detection and classification accuracy of unknown or variant malware.
关 键 词:数据安全与计算机安全 恶意软件分类 静态分析 多特征融合 集成学习
分 类 号:TP309.5[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30