基于BERT和自注意力SRU的AST级Webshell检测方法  

AST-Level Webshell Detection Method Based on BERT and Self-Attention SRU

在线阅读下载全文

作  者:李道丰[1] 宁梓桁 LI Daofeng;NING Ziheng(School of Computer and Electronic Information,Guangxi University,Nanning 530004,China)

机构地区:[1]广西大学计算机与电子信息学院,南宁530004

出  处:《信息网络安全》2025年第2期270-280,共11页Netinfo Security

基  金:国家自然科学基金[61662004]。

摘  要:Webshell作为一种隐蔽性强、危害性大的网页后门,已在网络安全领域受到广泛关注。Webshell代码的混淆技术显著降低了传统检测方法的有效性,且许多传统检测模型未能有效应对高效处理大量数据的需求。因此,文章提出一种结合BERT词嵌入、双向SRU网络结合自注意力机制的Webshell检测方法BAT-SRU。该方法通过抽象语法树提取代码特征,结合样本解混淆与危险函数统计提升特征质量,并采用BAT-SRU模型进行检测。现有方法如基于Word2Vec与双向GRU的检测方法、基于操作码序列与随机森林的分类方法以及基于Text-CNN的AST特征提取方法,存在特征表达不足和对复杂混淆代码适应性差的问题。相比上述方法,BAT-SRU在检测PHP Webshell上性能更优异,得到了准确率99.68%、精确率99.13%、召回率99.22%和F1值99.18%的实验结果。此外,与RNN及其变体模型相比,BAT-SRU在训练时间上可以节约23.47%,在推理时间上可以节省40.14%。Webshell,as a covert and harmful web backdoor,has drawn significant attention in the field of cybersecurity.Code obfuscation techniques in Webshells significantly reduce the effectiveness of traditional detection methods,furthermore,many traditional detection models fail to efficiently handle large scale data.Therefore,this paper proposed a method for Webshell detection,BAT-SRU,which combined BERT word embeddings,a bidirectional SRU network,and a self-attention mechanism.This method extracted code features through abstract syntax trees,combined sample de-obfuscation and dangerous function statistics to enhanced feature quality,and used the BAT-SRU model for detection.Existing methods,such as detection based on Word2Vec and bidirectional GRU,classification using opcode sequences and random forest,and AST-based feature extraction with Text-CNN,suffer from insufficient feature representation and poor adaptability to highly obfuscated code.Compared to the aforementioned methods,BAT-SRU demonstrates superior performance in detecting PHP Webshells,achieving an accuracy of 99.68%,precision of 99.13%,recall of 99.22%,and an F1 score of 99.18%.Additionally,when compare to RNN and its variant models,BAT-SRU reduces training time by 23.47%and inference time by 40.14%.

关 键 词:PHP Webshell 抽象语法树 BERT词嵌入 SRU 自注意力 

分 类 号:TP309[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象