基于代价敏感学习的半监督软件缺陷预测方法  被引量:1

SEMI-SUPERVISED SOFTWARE DEFECT PREDICTION BASED ON COST-SENSITIVE LEARNING

在线阅读下载全文

作  者:张金传 张震[2] Zhang Jinchuan;Zhang Zhen(Zhalantun Vocational College,Hulun Buir 162650,Inner Mongolia,China;School of Computer Science and Engineering,Central South University,Changsha 410000,Hunan,China)

机构地区:[1]扎兰屯职业学院,内蒙古呼伦贝尔162650 [2]中南大学计算机学院,湖南长沙410000

出  处:《计算机应用与软件》2022年第6期1-6,69,共7页Computer Applications and Software

基  金:国家自然科学基金项目(61802441);呼伦贝尔市基础教育科学十三五规划项目(19039)。

摘  要:软件缺陷预测通过学习软件缺陷历史数据建立缺陷预测模型,是开发可信软件的重要手段。现有的研究在学习不平衡软件缺陷数据时,确定合理的误分类代价是一个难以解决的问题。在代价敏感朴素贝叶斯方法的基础上拓展,提出一种动态调整模型参数的半监督学习方法——CSNB-EM(EM based Cost-Sensitive Naive Bayes)。该方法通过交叉验证搜索适合训练数据集的最优误分类代价,将搜索到的误分类代价用于建立分类模型,利用未标记数据迭代修正模型参数。方法利用未标记数据提高模型性能,同时克服了传统的软件缺陷预测中确定误分类代价的困难。基于AUC与GeoM评测指标在MDP软件缺陷数据集的5个项目上进行比较实验。实验结果表明,CSNB-EM与CS-NB、CS-NN等现有的代价敏感软件缺陷预测方法相比,其预测性能有明显提高。Software defect prediction constructs prediction models by learning historical defect data,which is an important approach for developing trustworthy software.When dealing with imbalanced defect data,the determination of misclassification costs poses a big challenge to software defect prediction.Based on the cost-sensitive naive Bayes model,this paper puts forward a semi-supervised learning method,called CSNB-EM,which can dynamically adjust model parameters.The method searched the optimal misclassification cost suitable for the training data set through cross validation to construct the classification model.It utilized unlabeled data to adjust the model parameters,which improved the performance of the model and overcame the difficulty in determining misclassification costs.Based on AUC and GeoM evaluation indexes,comparative experiments were carried out on five items of MDP software defect data set.The experimental results show that CSNB-EM outperforms several existing state-of-the-art cost-sensitive methods for software defect prediction,such as CS-NB(Cost-Sensitive Naive Bayes)and CS-NN(Cost-Sensitive Neural Network).

关 键 词:软件缺陷预测 不平衡数据 代价敏感学习 半监督学习 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象