一种基于领域适配的跨项目软件缺陷预测方法  被引量:15

Domain Adaptation Approach for Cross-project Software Defect Prediction

在线阅读下载全文

作  者:陈曙[1] 叶俊民[1] 刘童 CHEN Shu;YE Jun-Min;LIU Tong(School of Computer,Central China Normal University,Wuhan 430079,China)

机构地区:[1]华中师范大学计算机学院,湖北武汉 430079

出  处:《软件学报》2020年第2期266-281,共16页Journal of Software

基  金:国家科技支撑计划(2015BAK33B00).

摘  要:软件缺陷预测旨在帮助软件开发人员在早期发现和定位软件部件可能存在的潜在缺陷,以达到优化测试资源分配和提高软件产品质量的目的.跨项目缺陷预测在已有项目的缺陷数据集上训练模型,去预测新的项目中的缺陷,但其效果往往不理想,其主要原因在于,采样自不同项目的样本数据集,其概率分布特性存在较大差异,由此对预测精度造成较大影响.针对此问题,提出一种监督型领域适配(domainadaptation)的跨项目软件缺陷预测方法.将实例加权的领域适配与机器学习的预测模型训练过程相结合,通过构造目标项目样本相关的权重,将其施加于充足的源项目样本中,以实例权重去影响预测模型的参数学习过程,将来自目标项目中缺陷数据集的分布特性适配到训练数据集中,从而实现缺陷数据样本的复用和跨项目软件缺陷预测.在10个大型开源软件项目上对该方法进行实证,从数据集、数据预处理、实验结果多个角度针对不同的实验设定策略进行分析;从数据、预测模型以及模型适配层面分析预测模型的过拟合问题.实验结果表明,该方法性能优于同类方法,显著优于基准性能,且能够接近和达到项目内缺陷预测的性能.Software defect prediction aims at the very early step of software quality control, helps software engineers focus their attention on defect-prone parts during verification process. Cross-project defect predictions are proposed in which prediction models are trained by using sufficient training data from already existed software projects and predict defect in some other projects, however, their performances are always poor. The main reason is that, the divergence of the data distribution among different software projects causes a dramatic impact on the prediction accuracy. This study proposed an approach of cross-project defect prediction by applying a supervised domain adaptation based on instance weighting. The sufficient instances drawn from some source project are weighted by assigning target-dependent weights to the loss function of the prediction model when minimizing the expected loss over the distribution of source data, so that the distribution properties of the data from target project can be matched to the source project. Experiments including dataset selection, data preprocessing and results are described over different experiment strategies on ten open-source software projects. Over fitting problems are also studied through different levels including dataset, prediction model and domain adaptation process. The results show that the proposed approach is close to the performance of within-project defect prediction, better than similar approach and significantly better that of the baseline.

关 键 词:软件缺陷预测 软件缺陷度量元 机器学习 迁移学习 领域适配 

分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论] TP181[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象