基于深度学习的混合语言源代码漏洞检测方法

DL-HLVD:Deep Learning-based Hybrid Language Source Code Vulnerability Detection Method

作　　者：张学军[1] 郭梅凤张潇张斌[1] 黄海燕蔡特立 ZHANG Xuejun;GUO Meifeng;ZHANG Xiao;ZHANG Bin;HUANG Haiyan;CAI Teli(School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)

机构地区：[1]兰州交通大学电子与信息工程学院,兰州730070

出　　处：《湖南大学学报(自然科学版)》2025年第4期103-113,共11页Journal of Hunan University:Natural Sciences

基　　金：国家自然科学基金资助项目(61762058);甘肃省教育厅产业支撑项目(2022CYZC-38);国家电网科技项目(W32KJ2722010,522722220013);甘肃省重点研发计划项目(25YEFA089)。

摘　　要：现有基于深度学习的源代码漏洞检测方法主要针对单一编程语言进行特征学习,难以对混合编程语言软件项目因代码单元间的关联和调用产生漏洞进行有效检测.因此,本文提出了一种基于深度学习的混合语言源代码漏洞检测方法DL-HLVD.首先利用BERT层将代码文本转换为低维向量,并将其作为双向门控循环单元的输入来捕获上下文特征,同时使用条件随机场来捕获相邻标签间的依赖关系;然后对混合语言软件中不同类型编程语言的函数进行命名实体识别,并将其和程序切片结果进行重构来减少代码表征过程中的语法和语义信息的损失;最后设计双向长短期记忆网络模型提取漏洞代码特征,实现对混合语言软件漏洞检测.在SARD和CrossVul数据集上的实验结果表明,DL-HLVD在两类漏洞数据集上识别软件漏洞的综合召回率达到了95.0%,F1值达到了93.6%,比最新的深度学习方法VulDeePecker、SySeVR、Project Achilles在各个指标上均有提升,说明DL-HLVD能够提高混合语言场景下源代码漏洞检测的综合性能.The existing deep learning-based source code vulnerability detection methods mainly focus on the feature learning of a single programming language,and it is difficult to effectively detect the vulnerabilities caused by the association and invocation of code units in software projects of hybrid programming languages.To address this issue,a deep learning-based hybrid language vulnerability detection method DL-HLVD is proposed.Firstly,the BERT layer is used to convert the code text into low-dimensional vectors,which are then used as inputs to the bidirectional gated loop unit to capture the contextual features,and the conditional random field is used to capture the dependency between adjacent labels.Secondly,functions from different types of programming languages are identified as named entity recognition in the hybrid software and reconstructed with the program slicing results to reduce the loss of syntactic and semantic information in the code characterization process.Finally,the bidirectional long short-term memory network model is designed to extract the vulnerability code features and realize the vulnerability detection of hybrid language software.The comprehensive experimental results on the SARD and CrossVul datasets show that the comprehensive recall rate of DL-HLVD on the two types of vulnerability datasets is 95.0%,and the F1 value reaches 93.6%,which is improved in all indicators compared with the VulDeePecker,SySeVR,and Project Achilles.It demonstrates that the DL-HLVD method can improve the comprehensive performance of source code vulnerability detection in hybrid language scenarios.

关键词：漏洞检测命名实体识别程序切片混合语言

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度学习的混合语言源代码漏洞检测方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度学习的混合语言源代码漏洞检测方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索