基于TF-IDF的Webshell文件检测  被引量:8

Webshell File Detection Method Based on TF-IDF

在线阅读下载全文

作  者:赵瑞杰 施勇[1,2] 张涵 龙军[1] 薛质 ZHAO Rui-jie;SHI Yong;ZHANG Han;LONG Jun;XUE Zhi(School of Cyber Science and Engineering,Shanghai JiaoTong University,Shanghai 200240,China;Shanghai Information Security Integrated Management Technology Laboratory,Shanghai 200240,China)

机构地区:[1]上海交通大学网络空间安全学院,上海200240 [2]上海市信息安全综合管理技术实验室,上海200240

出  处:《计算机科学》2020年第S02期363-367,共5页Computer Science

基  金:国家重点研发计划项目(2017YFB0803200)。

摘  要:随着互联网的飞速发展,网络攻击行为日益频繁。Webshell是常见的网络攻击方式,而传统的检测手段已无法应对复杂灵活的变种Webshell攻击。为解决这一问题,提出了一种基于TF-IDF的Webshell文件检测方法。系统首先对不同类型的Webshell文件进行分类,并对不同文件进行相应的预处理转码,以降低混淆干扰技术对检测的影响;随后建立词袋模型,并采用TF-IDF算法加权提取相关特征;最后使用XGBoost算法训练得到检测模型。与传统机器学习算法进行的10折交叉验证对比测试表明,使用TF-IDF算法预处理后结合XGBoost算法的Webshell文件检测模型性能出色,检测效果相较于传统检测方法在准确率、精确率、召回率等方面均有所提高,同时具备更强的鲁棒性与泛化能力,其中对PHP类型文件检测的准确率达到了98.09%,对JSP类型文件检测准确率达到了97.09%。With the rapid development of Internet,cyber attacks are becoming more frequent.Webshell is a common cyber attack method,and traditional detection methods are unable to cope with complex and flexible variants of Webshell attacks.In order to solve this problem,webshell detection method based on TF-IDF is proposed.First of all,the system classifies Webshell files and transcodes different files accordingly to reduce the impact of confusion and interference technology on detection,then build a bag of words model and use TF-IDF algorithm to weight extract relevant features,and finally uses the XGBoost algorithm to train the detection model.Compared with the traditional machine learning algorithm,the Webshell detection model based on TF-IDF and XGBoost algorithm has higher accuracy than the traditional detection method,and has stronger robustness and generalization capabilities.The detection accuracy of XGBoost algorithm for PHP type files can reach 98.09%,and the accuracy for JSP type files can reach 97.09%.

关 键 词:Webshell检测 特征提取 交叉验证 TF-IDF 多层神经网络 支持向量机 随机森林 XGBoost算法 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象