类不平衡对软件缺陷预测模型稳定性和预测性能的影响分析方法  被引量:2

Influence Analysis Method of Class Imbalance on Software Defect Prediction Model Stability and Prediction Performance

在线阅读下载全文

作  者:张艳梅[1,2] 植胜林 姜淑娟[1,2] 袁冠[1,2] ZHANG Yan-mei;ZHI Sheng-lin;JIANG Shu-juan;YUAN Guan(Mine Digitization Engineering Research Center of the Ministry of Education,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China;School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China;KeHua Data CO.,LTD,Shenzhen,Guangdong 518055,China)

机构地区:[1]中国矿业大学矿山数字化工程研究中心,江苏徐州221116 [2]中国矿业大学计算机科学与技术学院,江苏徐州221116 [3]科华数据股份有限公司,广东深圳518055

出  处:《电子学报》2023年第8期2076-2087,共12页Acta Electronica Sinica

基  金:国家自然科学基金(No.61673384,No.71774159);中国博士后科学基金特别资助(No.2021T140707)。

摘  要:本文提出一种类不平衡对软件缺陷预测模型稳定性和预测性能的影响分析方法 .首先,使用欠采样方法将原数据集构造成一组不平衡率小于原数据集本身不平衡率的新数据集.其中,在构造数据集时使用固定种子,保证同一个数据集构造的同一个不平衡率的数据集中的数据相同,以减少每次运行结果的随机性.其次,以MCC值作为预测模型的性能评价指标,将每次产生的新数据集放入模型中的分类算法进行训练预测评价,获得当前数据集不同不平衡率下的MCC值,并提出稳定性评价指标.实验结果表明:与AUC相比,MCC更适合作为类不平衡情况下软件缺陷预测模型稳定性的评价指标;对于软件缺陷预测性能稳定性,代价敏感模型表现优于集成模型.The paper proposes a method for analyzing the influence of class imbalance on software defect prediction model stability and prediction performance.Firstly,the original data set is constructed into a set of new data sets whose un⁃balance rate is less than the original data set's unbalance rate by using the undersampling method.Where,fixed seeds are used in the construction of the data set to ensure that the data in the same unbalanced rate data set constructed by the same data set is the same,so as to reduce the randomness of the results of each run.Secondly,the MCC value is taken as the per⁃formance evaluation indicator of the prediction model,and the new data set generated each time is put into the classification algorithm of the model for training and prediction evaluation,so as to obtain the MCC value at different unbalanced rate for the current data set.We also propose a performance stability evaluation indicator.The experimental results show that,MCC is more suitable as the stability evaluation indicator of software defect prediction model under the condition of class imbalance compared with AUC.For the stability of software defect prediction performance,the cost sensitive model per⁃forms better than the ensemble model.

关 键 词:类不平衡 缺陷预测 稳定性 预测性能 评价指标 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象