基于卷积神经网络的代价敏感软件缺陷预测模型  被引量:8

Cost-sensitive Convolutional Neural Network Model for Software Defect Prediction

在线阅读下载全文

作  者:邱少健 蔡子仪 陆璐[1] QIU Shao-jian;CAI Zi-yi;LU Lu(School of Computer Science and Engineering,South China University of Technology,Guangzhou 510000,China)

机构地区:[1]华南理工大学计算机科学与工程学院

出  处:《计算机科学》2019年第11期156-160,共5页Computer Science

基  金:自然科学基金(61370103);广州产学研基金(201802020006);中山产学研基金(2017A1014)资助

摘  要:基于机器学习的软件缺陷预测方法受到软件工程领域学者们的普遍关注,通过缺陷预测模型可一定程度地分析软件中的缺陷分布,以此帮助软件质量保障团队发现软件中潜在的错误并合理分配测试资源。然而,现有多数的缺陷预测方法是基于代码行数、模块依赖程度、栈引用深度等人工提取的软件特征进行缺陷预测的。此类方法未考虑到软件源码中潜在的语义特征,可能导致预测效果不理想。为了解决以上问题,文中利用卷积神经网络挖掘源码中隐含的语义特征,并将其用于软件缺陷预测的任务中。在源码语义特征的有效挖掘方面,采用三层卷积神经网络提取数据抽象特征。在数据不平衡处理方面,采用代价敏感的方法,即分别给予正例与反例不同的权重,平衡正反例对模型训练的影响。在实验数据集方面,选取了开源缺陷标注数据集PROMISE中8个软件中的多个版本,合计19个项目。在模型性能比较方面,将提出的基于卷积神经网络的代价敏感软件缺陷预测模型(Cost-Sensitive Three-Layer Convolutional Neural Network,CS-TCNN)分别与逻辑回归、深度置信网络等模型进行比较,评估指标为在缺陷预测研究领域中普遍使用的AUC和MCC。实验结果充分说明了CS-TCNN能更有效地提取程序代码中的语义特征,进而提高软件缺陷预测模型的预测效果。Machine-learning-based software defect prediction methods are received widely attention from the researchers in the field of software engineering.The defect distribution in the software can be analyzed by the defect prediction mo-del,so as to help the software quality assurance team to detect potential software errors and allocate test resources reasonably.However,most of the existing defect prediction methods are based on hand-crafted features such as line of code,dependency between modules and stack reference depth.These methods do not take into account the potential semantic features of the software source code and may result in poor predictions.To solve the above problems,this paper applied convolutional neural networks to mine the semantic features implicit in the source code.In the effective mining of source code semantic features,this paper used three-layer convolutional neural network to extract data abstract features.In terms of data imbalance processing,this paper adopted a cost-sensitive method,which gives different weights to positive and negative examples,and balances the impact of positive and negative examples on model training.In terms of experimental data sets,this paper selected multiple versions of the eight softwares in the PROMISE defect dataset,totaling 19 projects.In terms of model comparison,this paper compared the proposed cost-sensitive software defect prediction model based on convolutional neural network(CS-TCNN) with logistic regression and deep confidence network respectively.The evaluation metrics contain AUC and MCC,which are widely used in the field of defect prediction research.The experimental results demonstrate that CS-TCNN can effectively extract the semantic features in the program code,and improve the prediction effect of the software defect prediction model.

关 键 词:软件缺陷预测 卷积神经网络 语义特征挖掘 代价敏感 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象