检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李莉[1] 石可欣 任振康 LI Li;SHI Kexin;REN Zhenkang(College of Information and Computer Engineering,Northeast Forestry University,Harbin Heilongjiang 150040,China)
机构地区:[1]东北林业大学信息与计算机工程学院,哈尔滨150040
出 处:《计算机应用》2022年第5期1554-1562,共9页journal of Computer Applications
摘 要:跨项目软件缺陷预测可以解决预测项目中训练数据较少的问题,然而源项目和目标项目通常会有较大的数据分布差异,这降低了预测性能。针对该问题,提出了一种基于特征选择和TrAdaBoost的跨项目缺陷预测方法(CPDP-FSTr)。首先,在特征选择阶段,采用核主成分分析法(KPCA)删除源项目中的冗余数据;然后,根据源项目和目标项目的属性特征分布,按距离选出与目标项目分布最接近的候选源项目数据;最后,在实例迁移阶段,通过采用评估因子改进的TrAdaBoost方法,在源项目中找出与目标项目中少量有标签实例分布相近的实例,并建立缺陷预测模型。以F1作为评价指标,与基于特征聚类和TrAdaBoost的跨项目软件缺陷预测(FeCTrA)方法以及基于多核集成学习的跨项目软件缺陷预测(CMKEL)方法相比,CPDP-FSTr的预测性能在AEEEM数据集上分别提高了5.84%、105.42%,在NASA数据集上分别提高了5.25%、85.97%,且其两过程特征选择优于单一特征选择过程。实验结果表明,当源项目特征选择比例和目标项目有类标实例比例分别为60%、20%时,所提CPDP-FSTr能取得较好的预测性能。Cross-project software defect prediction can solve the problem of few training data in prediction projects.However,the source project and the target project usually have the large distribution difference,which reduces the prediction performance.In order to solve the problem,a new Cross-Project Defect Prediction method based on Feature Selection and TrAdaBoost(CPDP-FSTr)was proposed.Firstly,in the feature selection stage,Kernel Principal Component Analysis(KPCA)was used to delete redundant data in the source project.Then,according to the attribute feature distribution of the source project and the target project,the candidate source project data closest to the target project distribution were selected according to the distance.Finally,in the instance transfer stage,the TrAdaBoost method improved by the evaluation factor was used to find out the instances in the source project which were similar to the distribution of a few labeled instances in the target project,and establish a defect prediction model.Using F1 as the evaluation index,compared with the methods such as cross-project software defect prediction using Feature Clustering and TrAdaBoost(FeCTrA),Cross-project software defect prediction based on Multiple Kernel Ensemble Learning(CMKEL),the proposed CPDP-FSTr had the prediction performance improved by 5.84%and 105.42%respectively on AEEEM dataset,enhanced by 5.25%and 85.97%respectively on NASA dataset,and its two-process feature selection is better than the single feature selection process.Experimental results show that the proposed CPDP-FSTr can achieve better prediction performance when the source project feature selection proportion and the target project labeled instance proportion are 60%and 20%respectively.
关 键 词:跨项目缺陷预测 特征选择 核主成分分析 实例迁移 TrAdaBoost
分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3