检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:乔梦晴 李琳[1,2] 王颉 万振华 Qiao Mengqing;Li Lin;Wang Jie;Wan Zhenhua(School of Computer Science&Technology,Wuhan University of Science&Technology,Wuhan 430065,China;Hubei Key Laboratory of Intelligent Information Processing&Real-time Industrial Systems,Wuhan University of Science&Technology,Wuhan 430065,China;Shenzhen Open Source Internet Security Technology Co.,Ltd.,Shen-zhen Guangdong 518000,China)
机构地区:[1]武汉科技大学计算机科学与技术学院,武汉430065 [2]武汉科技大学智能信息处理与实时工业系统湖北省重点实验室,武汉430065 [3]深圳开源互联网安全技术有限公司,广东深圳518000
出 处:《计算机应用研究》2023年第3期898-904,共7页Application Research of Computers
基 金:武汉市重点研发计划资助项目(2022012202015070);武汉科技大学研究生教改研究项目(Yjg202111);湖北省教育厅资助项目(2020354);湖北省大学生创新创业训练计划项目(S202110488047)。
摘 要:近年来恶意软件不断地发展变化,导致单一检测模型的准确率较低,使用集成学习组合多种模型可以提高检测效果,但集成模型中基学习器的准确性和多样性难以平衡。为此,提出一种基于遗传规划的集成模型生成方法,遗传规划可以将特征处理和构建集成模型两个阶段集成到单个程序树中,解决了传统恶意软件集成检测模型难以平衡个体准确率和多样性的问题。该方法以集成模型的恶意软件检出率作为种群进化依据,保证了基学习器的准确性;在构建集成模型时自动选择特征处理方法、分类算法和优化基学习器的超参数,通过输入属性扰动和算法参数扰动增加基学习器的多样性,根据优胜劣汰的思想进化生成具有高准确性和多样性的最优集成模型。在EMBER数据集上的结果表明,最优集成模型的检测准确率达到了98.88%;进一步的分析表明,该方法生成的模型具有较高的多样性和可解释性。In recent years, malware is constantly developing and changing, resulting in a low accuracy of a single detection model.Using ensemble learning to combine multiple models can improve the detection effect, but the accuracy and diversity of base learners in the ensemble model is difficult to balance.Therefore, this paper proposed an ensemble model generation method based on genetic programming to generate the optimal ensemble model for malware detection.Genetic programming could integrate feature processing and ensemble model construction into a single GP tree, which solved the problem that traditional malware ensemble detection models was difficult to balance individual accuracy and diversity.The method used the detection rate of malware in the ensemble model as the basis of population evolution to ensure the accuracy of the base learner.When generated an ensemble model, it could automatically select the feature processing method, classification algorithm and optimize the hyperparameters of the base classifier, and ensured the diversity of the base learner through the perturbation of input attributes and algorithm parameters.According to the idea of survival of the fittest, it evolved to generate the optimal ensemble model with high accuracy and diversity.The experimental results on the EMBER dataset show that the detection accuracy of the best ensemble model reaches 98.88%.Further analyses show that the model has high diversity and interpretability.
分 类 号:TP309.2[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.15.10.196