基于迁移学习和过采样技术的跨项目克隆代码一致性维护需求预测

Cross-project Clone Consistency Prediction via Transfer Learning and Oversampling Technology

作　　者：欧阳鹏陆璐[1,2] 张凡龙邱少健 OUYANG Peng;LU Lu;ZHANG Fan-long;QIU Shao-jian(School of Computer Science and Engineering,South China University of Technology,Guangzhou 510641,China;Technology Research Institute,South China University of Technology,Meizhou,Guangdong 514021,China;School of Computers,Guangdong University of Technology,Guangzhou 510006,China;School of Mathematics and Informatics,South China Agricultural University,Guangzhou 510642,China)

机构地区：[1]华南理工大学计算机科学与工程学院,广州510641 [2]华南理工大学梅州技术研究院,广东梅州514021 [3]广东工业大学计算机学院,广州510006 [4]华南农业大学数学与信息学院,广州510642

出　　处：《计算机科学》2020年第9期10-16,共7页Computer Science

基　　金：国家自然科学基金(61370103);广州产学研基金(201902020004);梅州产学研项目(2019A0101019)。

摘　　要：近年来,随着软件需求的不断增加,开发人员通过复用已有的代码向项目中引入了大量的克隆代码。随着软件版本的迭代和更新,克隆代码会发生变化,而克隆代码变化会导致额外的维护代价,并逐渐成为软件维护的负担。研究人员尝试利用机器学习方法开展克隆代码一致性维护需求预测研究,通过预测克隆代码的变化是否会导致额外的维护代价,来帮助软件质量保障团队更有效地分配维护资源,从而提高工作效率并降低运维成本。然而,在软件开发的初期阶段,软件项目往往没有经过充分的演化,缺少历史数据用于构建有效的预测模型,因此跨项目克隆代码一致性维护需求预测方法被提出。文中以减少跨项目数据分布差异为切入点,提出了基于迁移学习和过采样技术的跨项目克隆代码一致性维护需求预测方法CPCCP+,旨在将测试集与数据集映射到核空间中,通过迁移主成分分析方法减小跨项目数据的分布差异,并对数据集的类不平衡问题进行处理,从而提高跨项目预测模型的性能。在实验数据集方面,选取了7个开源数据集,合计形成42组跨项目克隆代码一致性维护需求预测任务。将提出的方法与使用基分类器的方法进行比较,评估指标包含Precision,Recall和F-Measure。实验结果表明,CPCCP+能更有效地进行跨项目克隆代码一致性维护需求的预测。In recent years,as software requirements increase,developers have introduced a large amount of clone code into the project by reusing existing code.As the software version is updated,the clone code changes and it may become a burden on software maintenance.Researchers have attempted to use the machine learning to conduct research on the prediction of clone code consistency,and help the software quality assurance team to allocate maintenance resources more effectively by predicting whether changes to cloned code will cause additional maintenance costs,thereby improving work efficiency and reducing maintenance costs.However,in the early stage of software development,software projects are often not fully evolved,and historical data is lacking for constructing an effective predictive model.Therefore,cross-project clone code consistency prediction methods are proposed.In this paper,we propose a cross-project clone code consistency prediction method via transfer learning and oversampling technology(CPCCP+).This method aims to match test set and training set into kernel space,reduce the distribution discrepancy of cross-project data by transfer component analysis,and alleviate the class imbalance issue to improve the performance of cross-project prediction model.In terms of experimental datasets,this paper selects seven open source datasets,which can form 42 combinations of cross-project clone code consistency prediction tasks totally.In terms of model performance comparison,the CPCCP+proposed in this paper is compared with the method only using base classifier.The evaluation metrics include precision,recall and F-measure.The experimental results show that CPCCP+can more effectively perform cross-project clone code consistency prediction.

关键词：克隆代码跨项目预测一致性变化迁移学习过采样技术

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于迁移学习和过采样技术的跨项目克隆代码一致性维护需求预测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于迁移学习和过采样技术的跨项目克隆代码一致性维护需求预测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索