基于前景理论的软件缺陷预测过采样方法  

Prospect theory-based oversampling for software defect prediction

在线阅读下载全文

作  者:徐彪 严远亭 张以文[1,2] XU Biao;YAN Yuanting;ZHANG Yiwen(Key Laboratory of Intelligent Computing and Signal Processing,Ministry of Education,Anhui University,Hefei 230601,China;School of Computer Science and Technology,Anhui University,Hefei 230601,China)

机构地区:[1]安徽大学计算智能与信号处理教育部重点实验室,安徽合肥230601 [2]安徽大学计算机科学与技术学院,安徽合肥230601

出  处:《计算机集成制造系统》2024年第8期2822-2831,共10页Computer Integrated Manufacturing Systems

基  金:国家自然科学基金资助项目(61806002,62272001)。

摘  要:在软件缺陷预测中,数据困难因子对预测性能的影响比类不平衡更为明显。然而,大多数现有软件缺陷预测过采样方法在解决类不平衡问题过程中,忽视了软件项目数据集固有的数据困难因子,从而导致预测性能不佳。针对上述问题,提出一种基于前景理论的过采样算法(POS)。POS同时考虑局部邻域中同类和异类样本的影响来评估少数类样本的学习难度,通过基于引力的策略构建同类收益和异类损失来刻画样本的前景值,并强调异类损失来计算少数类样本的采样权重,以此降低引入数据困难因子的风险,提高合成样本的质量,进一步提升预测性能。在NASA数据集上的实验结果表明,POS算法在AUC、balance和G-mean等性能指标上均有所提升,具有更好的缺陷预测性能。In software defect prediction,the data difficulty factors have a more significant impact on prediction performance than class imbalance.However,most existing oversampling methods ignore the data difficulty factors inherent in software project datasets when addressing the class imbalance problem,which leads to poor prediction performance.To solve above problems,a Prospect theory-based Over Sampling algorithm(POS)for software defect prediction was proposed,which evaluated the learning difficulty of minority samples by considering the influence of homogeneous and heterogeneous samples within the local neighborhood.To be specific,POS constructed homogeneous gains and heterogeneous losses to characterize the prospect value of minority samples via a gravity-based strategy,and strengthened heterogeneous losses to calculate the sampling weights of minority samples for reducing the risk of introducing data difficulty factors,improving the quality of synthetic samples,and further improving the prediction performance.Experimental results on the NASA datasets showed that POS outperformed the comparison algorithms in terms of performance metrics AUC,balance and G-mean.

关 键 词:软件缺陷预测 类不平衡 数据困难因子 过采样 前景理论 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象