基于词嵌入和特征融合的恶意软件检测研究

Research on Malware Detection Based on Word Embedding and Feature Fusion

作　　者：师智斌[1] 孙文琦窦建民于孟洋 Shi Zhibin;Sun Wenqi;Dou Jianmin;and Yu Mengyang(School of Computer Science and Technology,North University of China,Taiyuan 030051;Third Research Institute of Ministry of Public Security,Shanghai 200031;North Navigation Control Technology Co.,Ltd.,Beijing 100176)

机构地区：[1]中北大学计算机科学与技术学院,太原030051 [2]公安部第三研究所,上海200031 [3]北方导航控制技术股份有限公司,北京100176

出　　处：《信息安全研究》2025年第5期412-419,共8页Journal of Information Security Research

基　　金：信息网络安全公安部重点实验室(公安部第三研究所)开放课题(C23600-06)。

摘　　要：针对现有传统方法存在特征提取和表示受限、无法同时捕获API序列的空间语义特征和时序特征、无法捕获能决定目标任务的关键特征信息等问题,利用自然语言处理领域的词嵌入技术和多模型特征抽取以及特征融合技术,提出一种基于词嵌入和特征融合的恶意软件检测方法.首先使用自然语言处理领域的词嵌入技术对API序列编码,得到其语义特征编码表示;然后分别利用多重卷积网络和Bi-LSTM网络提取API序列的n-gram局部空间特征和时序特征;最后利用自注意力机制对捕获的特征进行关键位置信息的深度融合,通过刻画深层恶意行为特征实现分类任务.实验结果表明,在二分类任务中,该方法准确率达到94.79%,相较于传统机器学习方法平均提高了12.37%,比深度学习方法平均提高5.78%.在多分类任务中,该方法的准确率也达到91.95%,能够有效地提高对恶意软件的检测准确率.To address the limitations of traditional methods in feature extraction and representation,which are unable to simultaneously capture the spatial and temporal features of API sequences and fail to capture key features that determine the target task,a malware detection method based on word embedding and feature fusion has been proposed.First,the word embedding technology from the field of natural language processing is utilized to encode API sequences,obtaining their semantic feature representations.Then,multiple convolutional networks and Bi-LSTM networks are employed to extract n-gram local spatial features and temporal features of the API sequences,respectively.Finally,a self-attention mechanism is used to deeply fuse the captured features of critical positions,thereby achieving the classification task by characterizing deep malicious behavior features.Experimental results show that in binary classification tasks,the accuracy of this method reaches 94.79%,which is an improvement of 12.37%on average compared to traditional machine learning algorithms,and 5.78%higher on average compared to deep learning algorithms.In multi-class classification tasks,the accuracy of this model also reaches 91.95%,effectively enhancing the detection accuracy of malware.

关键词：恶意软件检测软件调用序列多重卷积网络长短期记忆网络特征融合

分类号：TP309[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于词嵌入和特征融合的恶意软件检测研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于词嵌入和特征融合的恶意软件检测研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索