检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李睿[1] 杨淑群 张新宇 LI Rui;YANG Shu-qun;ZHANG Xin-yu(School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)
机构地区:[1]上海工程技术大学电子电气工程学院,上海201620
出 处:《软件导刊》2022年第5期67-72,共6页Software Guide
基 金:国家自然科学基金项目(61802252)。
摘 要:随着信息化发展,PDF文档以其良好的特性,成为日常流行的数据交换文件格式,也成为APT攻击事件中常被利用的文件载体。现有的恶意PDF文档检测方法往往采用平衡样本数据集进行评估,但真实环境中恶意文档数量远少于良性文档,因此在样本分类不均衡情况下,提出KM-TBSMOTE双向采样法的恶意PDF文档检测方法。基于BSMOTE算法,利用生成的过渡样本合成新样本,给出TBSMOTE算法,提高负样本比例。利用K-Means算法进行良性PDF文档样本欠采样,结合TBSMOTE算法,使样本分类达到均衡状态。最后采用随机森林方法进行恶意性检测。实验表明,该方法在不均衡PDF样本集上检测效果良好,综合评价指标F1达98.98%,召回率98.91%,误检率0.026%。与传统BSMOTE过采样方法相比,评价指标F1提高1.39%,召回率提高1.96%,误检率降低0.048%。基于KM-TB⁃SMOTE双向采样的恶意PDF文档检测方法能够有效解决样本分类不均衡对分类模型的影响,提高检测效果,适用于现实环境中的PDF文档恶意性检测。With the development of information technology,PDF documents,with good characteristics,have become a popular file format for data exchange.It has also become a file carrier that is often used in APT attacks.Existing malicious PDF document detection methods often use balanced sample data sets,but the number of malicious documents in the real environment is far less than that of benign documents.Therefore,in the case of unbalanced sample classification,a malicious PDF document detection method based on KM-TBSMOTE bi-directional sampling method is proposed.Based on the BSMOTE algorithm,the generated transition samples are used to synthesize new samples,and the TBSMOTE algorithm is given to increase the proportion of negative samples.The K-Means algorithm is used to down-sampling the samples of benign PDF documents,combined with the TBSMOTE algorithm,so that the sample classification reaches a balanced state.Finally,the random forest method is used for malicious detection.Experiments show that this method has a good detection effect on the unbalanced PDF sample set.The comprehensive evaluation index F1 reaches 98.98%,the recall rate is 98.91%,and the false positive rate is 0.026%.Compared with the traditional BSMOTE oversampling method,the evaluation index F1 is increased by 1.39%,the recall rate is increased by 1.96%,and the false detection rate is reduced by 0.048%.The malicious PDF documents detection method based on KM-TBSMOTE bi-directional sampling can effectively solve the impact of imbalanced sample classification on the classification model,improve the detection effect,and is suitable for the malicious detection of PDF documents in the real environment.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117