集成特征选择的代价敏感Boosting软件缺陷预测方法  

Cost Sensitive Boosting Software Defect Prediction Method for Integrated Feature Selection

在线阅读下载全文

作  者:唐鹤龙 李英梅[1] 

机构地区:[1]哈尔滨师范大学计算机科学与信息工程学院,黑龙江 哈尔滨

出  处:《软件工程与应用》2023年第6期975-988,共14页Software Engineering and Applications

摘  要:软件中潜在的缺陷会产生严重的后果,通过使用软件缺陷预测技术可以及时地检测到模块中的缺陷。然而,由于软件缺陷数据集中的类不平衡和高维度特征问题,会导致模型的预测性能下降,因此提出了一种集成特征选择的代价敏感Boosting软件缺陷预测方法(Cost-Sensitive Boosting for Feature Selection, CSBFS)。CSBFS首先采用了一种代价敏感的特征选择算法,该算法先计算每个特征对预测结果的贡献值,并根据不同错误类别的代价对贡献值进行调整,选择具有正向贡献的特征作为特征子集,解决了高维度特征的问题;接下来,将这个特征选择算法嵌入进Boosting算法中,在Boosting的每一轮迭代中,为每个基学习器选择合适的特征子集,从而增加了基学习器之间的多样性;此外,通过调整错误类别的权重,为第一类错分样本赋予更高的权重,以缓解类别不平衡问题,进一步提高了预测效果。在20个公开数据集上进行实验,以F-measure、Recall、AUC、G-mean等作为评价指标,实验结果验证了CSBFS方法的有效性。Potential defects in software can have serious consequences and can be detected in a timely manner by using software defect prediction techniques. However, the problem of class imbalance and high dimensional features in the software defect dataset can lead to a degradation of the model's prediction performance, so a cost-sensitive Boosting for Feature Selection (CSBFS) method for software defect prediction with integrated feature selection is proposed. CSBFS method first employs a cost-sensitive feature selection algorithm. This algorithm first calculates the contribution value of each feature to the prediction result, adjusts the contribution value according to the cost of different error categories, and selects features with positive contribution as a feature subset, which solves the problem of high-dimensional features. Next, this feature selection algorithm is embedded into the Boosting algorithm, and a suitable feature subset is selected for each base learner in each iteration of Boosting, thus increasing the diversity among base learners. In addition, the prediction effect is further improved by adjusting the weights of the wrong categories and assigning higher weights to the first misclassified samples to alleviate the category imbalance problem. Experiments are conducted on 20 public datasets with F-measure, Recall, AUC, G-mean, etc. as evaluation indexes, and the experimental results validate the effectiveness of the CSBFS method.

关 键 词:软件缺陷预测 代价敏感 特征选择 集成学习 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象