检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]哈尔滨师范大学计算机科学与信息工程学院,黑龙江 哈尔滨
出 处:《软件工程与应用》2023年第6期975-988,共14页Software Engineering and Applications
摘 要:软件中潜在的缺陷会产生严重的后果,通过使用软件缺陷预测技术可以及时地检测到模块中的缺陷。然而,由于软件缺陷数据集中的类不平衡和高维度特征问题,会导致模型的预测性能下降,因此提出了一种集成特征选择的代价敏感Boosting软件缺陷预测方法(Cost-Sensitive Boosting for Feature Selection, CSBFS)。CSBFS首先采用了一种代价敏感的特征选择算法,该算法先计算每个特征对预测结果的贡献值,并根据不同错误类别的代价对贡献值进行调整,选择具有正向贡献的特征作为特征子集,解决了高维度特征的问题;接下来,将这个特征选择算法嵌入进Boosting算法中,在Boosting的每一轮迭代中,为每个基学习器选择合适的特征子集,从而增加了基学习器之间的多样性;此外,通过调整错误类别的权重,为第一类错分样本赋予更高的权重,以缓解类别不平衡问题,进一步提高了预测效果。在20个公开数据集上进行实验,以F-measure、Recall、AUC、G-mean等作为评价指标,实验结果验证了CSBFS方法的有效性。Potential defects in software can have serious consequences and can be detected in a timely manner by using software defect prediction techniques. However, the problem of class imbalance and high dimensional features in the software defect dataset can lead to a degradation of the model's prediction performance, so a cost-sensitive Boosting for Feature Selection (CSBFS) method for software defect prediction with integrated feature selection is proposed. CSBFS method first employs a cost-sensitive feature selection algorithm. This algorithm first calculates the contribution value of each feature to the prediction result, adjusts the contribution value according to the cost of different error categories, and selects features with positive contribution as a feature subset, which solves the problem of high-dimensional features. Next, this feature selection algorithm is embedded into the Boosting algorithm, and a suitable feature subset is selected for each base learner in each iteration of Boosting, thus increasing the diversity among base learners. In addition, the prediction effect is further improved by adjusting the weights of the wrong categories and assigning higher weights to the first misclassified samples to alleviate the category imbalance problem. Experiments are conducted on 20 public datasets with F-measure, Recall, AUC, G-mean, etc. as evaluation indexes, and the experimental results validate the effectiveness of the CSBFS method.
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.75