基于程序语义与度量的代码缺陷检测  

Code Defect Detection Based on Program Semantics and Metrics

在线阅读下载全文

作  者:卢跃 嵇友晴 周礼亮 吕青 张迎周[1] LU Yue;JI Youqing;ZHOU Liliang;LYU Qing;ZHANG Yingzhou(College of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;The 10th Research Institute of China Electronics Technology Group Corporation,Chengdu 610036,China)

机构地区:[1]南京邮电大学计算机学院,江苏南京210023 [2]中国电子科技集团公司第十研究所,四川成都610036

出  处:《中北大学学报(自然科学版)》2025年第1期105-115,共11页Journal of North University of China(Natural Science Edition)

基  金:国家自然科学基金资助项目(62272214)。

摘  要:软件中存在的代码缺陷严重影响了软件用户使用的体验感和安全性,传统的代码缺陷检测方法存在准确率较低的问题,而结合深度学习的现有方法的检测粒度较粗,检测效果也不够理想。为此,本文提出了一种基于程序语义与度量的代码缺陷检测方法。该方法采用基于LLVM IR的代码缺陷的兴趣点检测算法,使用轻量级符号化程序切片工具SymPas获取与缺陷兴趣点相关的程序切片。通过预训练模型将程序切片代码片段转化为向量表示,并融合指令级切片度量——认知复杂度度量,深入分析了切片语句之间的关系和特征。通过构建混合模型ResCNN-GRU进行训练,将提取的特征进行了有效融合和学习。实验结果表明,本文利用符号化程序切片技术细化了漏洞检测的粒度,在中间表示LLVM IR下融合的语义和度量信息能更好地表示缺陷代码语句间的关系和特征,构建的混合模型一定程度上解决了时间序列问题以及样本数量不均衡问题,相比其他先进方法,本文方法的准确率达到了94.1%。Code defects in software seriously affect the experience and security of software users.Traditional code defect detection methods have the problem of low accuracy,while the existing methods combined with deep learning have coarse detection granularity and less than ideal detection effect.For this reason,this paper proposed a code defect detection method based on program semantics and metrics.A point-of-interest detection algorithm for code defects based on LLVM IR was designed,which used SymPas,a lightweight symbolic program slicing tool,to obtain program slices related to defective points of interest.The program slices code fragments were transformed into vector representations by a pre-trained model,and the instruction-level slicing metric,cognitive complexity metric,was fused to deeply analyze the relationships and features between the sliced statements.A hybrid model ResCNN-GRU was constructed for training to effectively fuse and learn the extracted features.The experimental results show that this paper refines the granularity of vulnerability detection by using symbolic program slicing technique,the fused semantic and metric information under the intermediate representation LLVM IR can better represent the relationships and features between the defective code statements,and the constructed hybrid model solves the time-series problem as well as the unbalanced number of samples problem to a certain extent,and comparing with several advanced methods,the accuracy of this paper′s method reaches 94.1%.

关 键 词:预训练模型 程序切片 切片认知域 残差网络 卷积神经网络 门控制神经网络 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象