多层次过采样集成的不平衡数据缺陷预测模型  被引量:6

Unbalanced Data Defect Prediction Model Based on Multi-level Oversampling Integration

在线阅读下载全文

作  者:饶珍丹 李英梅[1] 董昊 张彤 RAO Zhen-dan;LI Ying-mei;DONG Hao;ZHANG Tong(School of Computer Science and Information Engineering,Harbin Normal University,Harbin 150025,China)

机构地区:[1]哈尔滨师范大学计算机科学与信息工程学院,哈尔滨150025

出  处:《小型微型计算机系统》2023年第4期888-896,共9页Journal of Chinese Computer Systems

基  金:黑龙江省自然科学基金项目(F2017021)资助;哈尔滨师范大学硕士研究生创新科研项目(HSDSSCX2021-49)资助;哈尔滨师范大学计算机学院科研项目(JKYKYY202003)资助。

摘  要:针对软件缺陷预测中不平衡数据的分类问题,提出了一种基于过采样和集成学习的类不平衡软件缺陷预测模型XG-AJCC(AJCC-Ram+XGBoost).在预处理阶段,提出了AJCC-Ram(Adaptive Judgment Cure Clustering Random Sampling)多层次过采样方法.该方法基于改进的ADASYN自适应过采样和CURE-SMOTE过采样分别在类边缘和类中心层面生成新样本,通过CLNI方法对样本生成后的数据集进行噪声过滤及清理.在模型构建阶段,与集成算法XGBoost(eXtreme Gradient Boosting)相结合形成最终的不平衡数据缺陷预测模型.本文在AEEEM数据集和NASA数据集中进行了验证,实验结果表明:较于经典的采样方法和采样集成预测模型,在F1指标上AJCC-Ram过采样方法及XG-AJCC采样集成算法模型均能够取得有效的预测结果.Aiming at the classification of unbalanced data in software defect prediction,an unbalanced data software defect prediction model XG-AJCC(AJCC-Ram+XGBoost)based on oversampling and ensemble learning is proposed.In the pretreatment stage,AJCC-Ram(Adaptive Judgment Cure Clustering Random Sampling)multi-level oversampling method is proposed.This method is based on the improved ADASYN adaptive oversampling and CURE-SMOTE oversampling for creating new samples at class edge and class center level respectively,and the resulting data set is noise-filtered and cleaned by CLNI method.In the model construction stage,it is combined with the integrated algorithm XGBoost(eXtreme Gradient Boosting)to form the final unbalanced data defect prediction model.This paper is verified in AEEEM data set and NASA data set.The experimental results show that AJCC-Ram oversampling method and XG-AJCC sampling integrated algorithm model can achieve more effective prediction results in F1 index,compared with the classical sampling method and sampling integrated prediction model.

关 键 词:软件缺陷预测 类不平衡 过采样 XGBoost 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象