基于特征选择和TrAdaBoost的跨项目缺陷预测方法被引量：4

Cross-project defect prediction method based on feature selection and TrAdaBoost

作　　者：李莉[1] 石可欣任振康 LI Li;SHI Kexin;REN Zhenkang(College of Information and Computer Engineering,Northeast Forestry University,Harbin Heilongjiang 150040,China)

机构地区：[1]东北林业大学信息与计算机工程学院,哈尔滨150040

出　　处：《计算机应用》2022年第5期1554-1562,共9页journal of Computer Applications

摘　　要：跨项目软件缺陷预测可以解决预测项目中训练数据较少的问题,然而源项目和目标项目通常会有较大的数据分布差异,这降低了预测性能。针对该问题,提出了一种基于特征选择和TrAdaBoost的跨项目缺陷预测方法(CPDP-FSTr)。首先,在特征选择阶段,采用核主成分分析法(KPCA)删除源项目中的冗余数据;然后,根据源项目和目标项目的属性特征分布,按距离选出与目标项目分布最接近的候选源项目数据;最后,在实例迁移阶段,通过采用评估因子改进的TrAdaBoost方法,在源项目中找出与目标项目中少量有标签实例分布相近的实例,并建立缺陷预测模型。以F1作为评价指标,与基于特征聚类和TrAdaBoost的跨项目软件缺陷预测(FeCTrA)方法以及基于多核集成学习的跨项目软件缺陷预测(CMKEL)方法相比,CPDP-FSTr的预测性能在AEEEM数据集上分别提高了5.84%、105.42%,在NASA数据集上分别提高了5.25%、85.97%,且其两过程特征选择优于单一特征选择过程。实验结果表明,当源项目特征选择比例和目标项目有类标实例比例分别为60%、20%时,所提CPDP-FSTr能取得较好的预测性能。Cross-project software defect prediction can solve the problem of few training data in prediction projects.However,the source project and the target project usually have the large distribution difference,which reduces the prediction performance.In order to solve the problem,a new Cross-Project Defect Prediction method based on Feature Selection and TrAdaBoost(CPDP-FSTr)was proposed.Firstly,in the feature selection stage,Kernel Principal Component Analysis(KPCA)was used to delete redundant data in the source project.Then,according to the attribute feature distribution of the source project and the target project,the candidate source project data closest to the target project distribution were selected according to the distance.Finally,in the instance transfer stage,the TrAdaBoost method improved by the evaluation factor was used to find out the instances in the source project which were similar to the distribution of a few labeled instances in the target project,and establish a defect prediction model.Using F1 as the evaluation index,compared with the methods such as cross-project software defect prediction using Feature Clustering and TrAdaBoost(FeCTrA),Cross-project software defect prediction based on Multiple Kernel Ensemble Learning(CMKEL),the proposed CPDP-FSTr had the prediction performance improved by 5.84%and 105.42%respectively on AEEEM dataset,enhanced by 5.25%and 85.97%respectively on NASA dataset,and its two-process feature selection is better than the single feature selection process.Experimental results show that the proposed CPDP-FSTr can achieve better prediction performance when the source project feature selection proportion and the target project labeled instance proportion are 60%and 20%respectively.

关键词：跨项目缺陷预测特征选择核主成分分析实例迁移 TrAdaBoost

分类号：TP311.5[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于特征选择和TrAdaBoost的跨项目缺陷预测方法被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于特征选择和TrAdaBoost的跨项目缺陷预测方法 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于特征选择和TrAdaBoost的跨项目缺陷预测方法被引量：4