代价敏感的Boosting软件缺陷预测方法  被引量:7

Cost Sensitive Boosting Software Defect Prediction Method

在线阅读下载全文

作  者:李莉[1] 任振康 石可欣 LI Li;REN Zhenkang;SHI Kexin(College of Information and Computer Engineering,Northeast Forestry University,Harbin 150040,China)

机构地区:[1]东北林业大学信息与计算机工程学院,哈尔滨150040

出  处:《计算机工程》2022年第3期175-180,共6页Computer Engineering

基  金:黑龙江省教育科学规划重点课题(GJB1421251)。

摘  要:软件缺陷预测可以有效提高软件的可靠性,修复系统存在的漏洞。Boosting重抽样是解决软件缺陷预测样本数量不足问题的常用方法,但常规Boosting方法在处理领域类不平衡问题时效果不佳。为此,提出一种代价敏感的Boosting软件缺陷预测方法 CSBst。针对缺陷模块漏报和误报代价不同的问题,利用代价敏感的Boosting方法更新样本权重,增大产生第一类错误的样本权重,使之大于无缺陷类样本权重与第二类错误样本的权重,从而提高模块的预测率。采用阈值移动方法对多个决策树基分类器的分类结果进行集成,以解决过拟合问题。在此基础上,通过分析给出模型构建过程中权重和阈值的最优化设置。在NASA软件缺陷预测数据集上进行实验,结果表明,在小样本的情况下,与CSBKNN、CSCE方法相比,CSBst方法的BAL预测指标分别提升7%和3%,且时间复杂度降低一个数量级。Software defect prediction can effectively improve the reliability of software and remedy the loopholes in a system. Boosting resampling is a common method for solving the problem of insufficient software defect prediction samples. However,the conventional Boosting method is ineffective in solving the problem of domain class imbalance.Therefore,a cost sensitive Boosting software defect prediction method named CSBst is proposed in this study.Considering the different costs of missing data and false positives in the defect module,the cost sensitive Boosting method is used to update and increase the sample weight of the first error type. This ensures that the updated weight is greater than the weight of the flawless sample and the second error type sample,which improves the prediction rate of the module. The threshold moving method is used to integrate the classification results of multiple decision tree-based classifiers to solve the over fitting problem. Subsequently,the optimal weight and threshold values in the model construction process are determined analytically. Experiments on NASA software defect prediction dataset demonstrate that with small samples,compared to CSBKNN and CSCE methods,the BAL prediction index of CSBst method is 7%and 3% higher,respectively.Moreover,the time complexity is reduced by one order of magnitude.

关 键 词:软件缺陷预测 决策树 机器学习 阈值移动方法 BOOSTING方法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象