基于DistilBert-LSTM与多项朴素贝叶斯的漏洞检测方法  被引量:2

A vulnerability detection model based on DistilBert⁃LSTM and multinomial naive Bayes

在线阅读下载全文

作  者:王璇[1,2,3] 王馨彤 陈燕俐[1,2,3] 孙知信[1,2,3] WANG Xuan;WANG Xintong;CHEN Yanli;SUN Zhixin(Post Big Data Technology and Application Engineering Research Center of Jiangsu Province,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;Post Industry Technology Research and Development Center of the State Posts Bureau(Internet of Things Technology),Nanjing University of Posts and Telecommunications,Nanjing 210003,China;Key Lab of Broadband Wireless Communication and Sensor Network Technology,Ministry of Education,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)

机构地区:[1]南京邮电大学江苏省邮政大数据技术与应用工程研究中心,江苏南京210003 [2]南京邮电大学国家邮政局邮政行业技术研发中心(物联网技术),江苏南京210003 [3]南京邮电大学宽带无线通信与传感网技术教育部重点实验室,江苏南京210003

出  处:《南京邮电大学学报(自然科学版)》2023年第2期102-110,共9页Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition

基  金:国家自然科学基金(62272239,61972208)资助项目。

摘  要:软件漏洞检测是维系软件安全性的关键,漏洞的高效检验是当前的研究热点。文中提出了一种基于DistilBert-LSTM与多项朴素贝叶斯的漏洞检测模型。为实现漏洞函数的源代码文本深度表征,文中通过DistilBert-LSTM挖掘漏洞的局部关键特征和全局时间特征,并得出漏洞的存在性概率;针对漏洞检测过程中的难样本,通过多项朴素贝叶斯进行优化检测,该模型使用TF-IDF矢量化器进行数据预处理,并通过执行卡方检验进行特征选择,将所得输出至多项朴素贝叶斯分类器中进行检测,以获得最终的漏洞检测结果。实验结果表明,文中提出的方法在公共漏洞和暴露数据库的数据上有效提高了漏洞检测的准确率和精确率,同时降低了漏洞检测的误报率和漏报率,相较于其他机器学习模型,具有更优的性能指标。Vulnerability detection is critical to software security,and has become a research hotspot.This paper proposes a vulnerability detection model based on DistilBert⁃long short⁃term memory(LSTM)and multinomial naive Bayes.In order to capture and describe the deep features of source code texts,this paper uses DistilBert⁃LSTM to mine the local key features and global time features of the vulnerability,and obtains the existence probability of the vulnerability.It also optimizes the detection through multiple naive Bayes methods for the difficult samples of the vulnerability detection.The proposed model uses TF⁃IDF vectorizer for data preprocessing,performs chi⁃square test for feature selection,and outputs the obtained results to multiple naive Bayes models.The Bayes classifier for detection obtains the final vulnerability detection results.The experimental results show that the proposed model can effectively improve the accuracy and precision of vulnerability detection on the public datasets, and reduce the falsepositive rate and false negative rate of vulnerability detection. Compared with other machine learningmodels, this model has better performance indicators.

关 键 词:漏洞检测 源代码表征 语言模型 长短期记忆网络 多项朴素贝叶斯分类器 

分 类 号:TP393.0[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象