融合对抗训练与BERT-CNN-BiLSTM多通道神经网络的恶意URL检测研究  

Research on Malicious URL Detection Using a Multi-Channel Neural Network that Integrates Adversarial Training with BERTCNN-BiLSTM

在线阅读下载全文

作  者:刘卓娴 王靖亚[1] 石拓[2] LIU Zhuoxian;WANG Jingya;SHI Tuo(Information and Network Security College,People’s Public Security University of China,Beijing 100038,China;Department of Public Security Management,Beijing Police College,Beijing 102202,China)

机构地区:[1]中国人民公安大学信息网络安全学院,北京100038 [2]北京警察学院公安管理系,北京102202

出  处:《信息网络安全》2024年第12期1922-1932,共11页Netinfo Security

基  金:北京市自然科学基金[9244025];国家社会科学基金重点项目[20AZD114]。

摘  要:恶意URL是一种用于定位网络资源的标识符,常被用于实施欺骗、勒索和窃取信息等恶意行为,是近年来多种网络攻击的重要媒介,给受害者造成了巨大损失。针对恶意URL攻击日益猖獗的现状,以及恶意URL本身特征复杂、混淆性强且欺骗性高的问题,同时考虑现有研究中特征提取不充分以及对模型鲁棒性和泛化能力关注不够的局限性,文章提出一种融合对抗训练与BERT-CNN-BiLSTM多通道神经网络的恶意URL检测模型。该模型将URL视为文本序列,利用BERT模型进行预处理,分别通过CNN层和Bi LSTM层提取局部语义特征和捕捉上下文语序特征,并通过FGM对抗训练方法对Embedding层施加扰动,从而提升模型的准确性和鲁棒性。在公开数据集上的实验结果表明,该模型在URL二分类任务中的分类准确率达到97.2%。消融实验和对比实验进一步验证了该模型在多个评价指标上的显著优势。此外,该模型在针对恶意URL更加精细化分类的任务中同样表现优异,在URL五分类任务中的分类准确率达到98.25%。Malicious URL are identifiers used to locate network resources and are frequently exploited to execute malicious activities such as fraud,extortion,and data theft.They have become critical mediums for numerous cyberattacks in recent years,causing significant harm to victims.Given the increasing prevalence of malicious URL attacks and the inherent complexity,ambiguity,and deceptive nature of malicious URL characteristics,along with the limitations of existing research in terms of insufficient feature extraction and inadequate focus on model robustness and generalization,this paper proposed a malicious URL detection model that integrates adversarial training with a BERT-CNN-BiLSTM multichannel neural network.The proposed model treated URLs as textual sequences,leveraging the BERT model for preprocessing to extract semantic features,followed by the CNN layer to capture local features and the BiLSTM layer to extract contextual sequential features.Furthermore,adversarial training using the Fast Gradient Method(FGM)introduced perturbations to the embedding layer,enhancing the model’s accuracy and robustness.Experimental results on public datasets demonstrate that the model achieves a classification accuracy of 97.2%on the binary classification task of URL detection.Ablation studies and comparative experiments further validate the model’s significant advantages across multiple evaluation metrics.Additionally,the model exhibits outstanding performance in fine-grained classification tasks of malicious URL,achieving a classification accuracy of 98.25%in a five-class URL classification task.

关 键 词:对抗训练 BERT 多通道神经网络 恶意URL检测 

分 类 号:TP309[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象