基于深度卷积神经网络的源代码缺陷检测方法  被引量:7

Source code defect detection using deep convolutional neural networks

在线阅读下载全文

作  者:王晓萌 管志斌 辛伟[1] 王嘉捷[1] WANG Xiaomeng;GUAN Zhibin;XIN Wei;WANG Jiajie(China Information Technology Security Evaluation Center,Beijing 100085,China)

机构地区:[1]中国信息安全测评中心,北京100085

出  处:《清华大学学报(自然科学版)》2021年第11期1267-1272,共6页Journal of Tsinghua University(Science and Technology)

基  金:国家自然科学基金资助项目(U1836209,U1736110,U1936211,U1936101,U1836113)。

摘  要:基于深度神经网络的源代码缺陷检测方法通常将源代码作为文本数据,采用卷积网络学习代码的单一空间特征,或者利用LSTM、BiLSTM源代码样本的时序特征,并未在源代码数据的多特征融合方面进行深入研究。为探索验证源代码的多种特征在缺陷检测方面的应用效果,该文基于卷积神经网络在图像领域的多通道学习策略,融合word2vec、fasttext等词嵌套技术的词向量表达,创建源代码的综合向量表征;利用深度卷积神经网络学习源代码缺陷数据中蕴含的缺陷模式,形成源代码缺陷分类器,实现多类代码缺陷检测。将该方法与已有的单通道神经网络源代码缺陷检测方法通过SARD数据集和开源软件源代码进行验证,结果表明:该方法在精确度、召回率、F_(1)等方面测试平均结果分别为95.3%、84.7%、89.7%,与已有方法相比,有不同幅度的提升。Deep learning-based source code defect detection looks at the source code as text data.The defect detection then uses a one-dimensional convolutional network to learn the single spatial characteristics of the code or uses the sequential characteristics of LSTM and BiLSTM which do not take various features of the source code into account.This article uses the multi-channel learning strategy of convolutional neural networks for image classification to identify multi-class source code defects by deep convolutional neural networks.First,a word embedding algorithm such as word2vec or fasttext is used to construct the fusion features with the deep convolutional neural network then used to identify the defect patterns contained in the source code defect data set to form a source code defect classifier.The classifier is then used to recognize defect codes and their corresponding CWE type.The method was evaluated on the SARD dataset and open source software.The results show that this method is superior to existing methods with a model evaluation parameter accuracy of 95.3%,a recall rate of 84.7%,and F_(1) of 89.7%.

关 键 词:深度卷积神经网络 特征融合 多分类 源代码 缺陷检测 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象