检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐金鹏 郭新峰[1] 王瑞波 李济洪 XU Jinpeng;GUO Xinfeng;WANG Ruibo;LI Jihong(School of Automation and Software Engineering,Shanxi University,Taiyuan 030006,China;School of Modern Education Technology,Shanxi University,Taiyuan 030006,China)
机构地区:[1]山西大学自动化与软件学院,太原030006 [2]山西大学现代教育技术学院,太原030006
出 处:《计算机科学》2023年第12期24-31,共8页Computer Science
基 金:国家自然科学基金青年科学基金(61806115)。
摘 要:在软件缺陷预测任务中,通常基于C&K等静态软件特征数据集,使用机器学习分类算法来构建软件缺陷预测(SDP)模型。然而,大多数静态软件特征数据集中缺陷数较少,数据集的类不平衡问题较为严重,导致学习到的SDP模型的预测性能较差。文中基于生成对抗网络(GAN),并利用FID得分筛选生成正例样本数据,增强正例样本量,然后在组块正则化m×2交叉验证(m×2BCV)框架下,通过众数投票法聚合多个子模型的结果,最终构成SDP模型。以PROMISE数据库下的20个数据集为实验数据集,采用随机森林算法构建SDP聚合模型。实验结果表明,与传统的随机上采样、SMOTE、随机下采样相比,所提SDP聚合模型的F1平均值分别提高了10.2%,5.7%,3.4%,且F1的稳定性也得到相应提高;所提SDP聚合模型在20个数据集的评测中,有17个F1值最高。从AUC指标来看,所提方法与传统的采样方法没有明显差异。In the task of software defect prediction,the machine learning classification algorithm is usually used to build a software defect prediction(SDP)model based on dataset with static softwarefeatures such as C&K metrics.However,the number of defects in most datasets with static software metrics is small,the class imbalance in the dataset is serious,resulting in the low prediction performance of the model.Based on generation adversarial network(GAN),this paper uses FID score screening to ge-nerate positive sample data,enhances the amount of postitive data,and then aggregates the results of learned models by majority-voting,and finally build the SDP model based on block-regularized m×2 Cross validation(m×2BCV).20 datasets in PROMISE database are used as the experimental datasets,and random forest algorithm is used to build model.Experimental results show that,compared with the traditional random over-sampling,SMOTE,and random under-sampling,the average F1 values of the SDP aggregation model in the 20 datasets is increased by 10.2%,5.7%,and 3.4%respectively,and the stability of F1 is also improved accordingly.In 17 of the 20 datasets,the SDP aggregation models have the highest F1 values.From the AUC index,there is no significant difference between the proposed method and the traditional sampling method.
关 键 词:生成对抗网络 数据增强 组块正则化交叉验证 软件缺陷预测 聚合模型
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222