基于GAN数据增强的软件缺陷预测聚合模型被引量：3

Aggregation Model for Software Defect Prediction Based on Data Enhancement by GAN

作　　者：徐金鹏郭新峰[1] 王瑞波李济洪 XU Jinpeng;GUO Xinfeng;WANG Ruibo;LI Jihong(School of Automation and Software Engineering,Shanxi University,Taiyuan 030006,China;School of Modern Education Technology,Shanxi University,Taiyuan 030006,China)

机构地区：[1]山西大学自动化与软件学院,太原030006 [2]山西大学现代教育技术学院,太原030006

出　　处：《计算机科学》2023年第12期24-31,共8页Computer Science

基　　金：国家自然科学基金青年科学基金(61806115)。

摘　　要：在软件缺陷预测任务中,通常基于C&K等静态软件特征数据集,使用机器学习分类算法来构建软件缺陷预测(SDP)模型。然而,大多数静态软件特征数据集中缺陷数较少,数据集的类不平衡问题较为严重,导致学习到的SDP模型的预测性能较差。文中基于生成对抗网络(GAN),并利用FID得分筛选生成正例样本数据,增强正例样本量,然后在组块正则化m×2交叉验证(m×2BCV)框架下,通过众数投票法聚合多个子模型的结果,最终构成SDP模型。以PROMISE数据库下的20个数据集为实验数据集,采用随机森林算法构建SDP聚合模型。实验结果表明,与传统的随机上采样、SMOTE、随机下采样相比,所提SDP聚合模型的F1平均值分别提高了10.2%,5.7%,3.4%,且F1的稳定性也得到相应提高;所提SDP聚合模型在20个数据集的评测中,有17个F1值最高。从AUC指标来看,所提方法与传统的采样方法没有明显差异。In the task of software defect prediction,the machine learning classification algorithm is usually used to build a software defect prediction(SDP)model based on dataset with static softwarefeatures such as C&K metrics.However,the number of defects in most datasets with static software metrics is small,the class imbalance in the dataset is serious,resulting in the low prediction performance of the model.Based on generation adversarial network(GAN),this paper uses FID score screening to ge-nerate positive sample data,enhances the amount of postitive data,and then aggregates the results of learned models by majority-voting,and finally build the SDP model based on block-regularized m×2 Cross validation(m×2BCV).20 datasets in PROMISE database are used as the experimental datasets,and random forest algorithm is used to build model.Experimental results show that,compared with the traditional random over-sampling,SMOTE,and random under-sampling,the average F1 values of the SDP aggregation model in the 20 datasets is increased by 10.2%,5.7%,and 3.4%respectively,and the stability of F1 is also improved accordingly.In 17 of the 20 datasets,the SDP aggregation models have the highest F1 values.From the AUC index,there is no significant difference between the proposed method and the traditional sampling method.

关键词：生成对抗网络数据增强组块正则化交叉验证软件缺陷预测聚合模型

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于GAN数据增强的软件缺陷预测聚合模型被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于GAN数据增强的软件缺陷预测聚合模型 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于GAN数据增强的软件缺陷预测聚合模型被引量：3