融合特征选择与多模型软投票集成学习的代码异味检测方法  被引量:1

Code Smell Detection Method Combining Feature Selection and Multi-model Soft-voting Ensemble Learning

在线阅读下载全文

作  者:黄晨峻 高建华[1] HUANG Chenjun;GAO Jianhua(Department of Computer Science and Technology,Shanghai Normal University,Shanghai 200234,China)

机构地区:[1]上海师范大学计算机科学与技术系,上海200234

出  处:《小型微型计算机系统》2025年第2期504-512,共9页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61672355)资助。

摘  要:代码异味会导致软件质量逐渐衰退,降低软件可理解性和可维护性.为检测软件结构中的代码异味,提出了一种基于CK度量的、经过两步特征选择的软投票集成学习的代码异味检测方法,该方法首先进行特征选择,使用Pearson相关系数剔除冗余特征,并在剩余度量中使用XGBoost特征重要性筛选相关度大的度量.然后,针对仅使用单一机器学习模型泛化性能不佳的问题,提出一种基于5种较成熟机器学习模型的软投票集成学习模型,完成代码异味分类检测任务.实验基于CK度量,利用含7个开源项目、4种代码异味的数据集,实验结果表明,此种方法能够减少特征维度,且在性能指标上优于其它分类模型,其中F1值最高提升3.24%,AUC最高提升2.32%.Code smells can lead to the gradual deterioration of software quality and reduce the understandability and maintainability.To detect code smells in software structure,it is proposed a method based on CK metrics and two-step feature selection soft voting ensemble learning in this paper.Firstly,Pearson correlation coefficient was used to remove redundant attributes,and XGBoost feature importance was used to select the attributes with high correlation in the remaining attributes.Then,in order to solve the problem of poor generalization performance using only one single machine learning model,a soft voting ensemble learning model based on five mature machine learning models was proposed to complete the code smells classification detection task.The experiment is based on CK metrics,the data set containing 7 open source projects and 4 types of code odor is used.The results show that the proposed method can reduce the characteristic dimension and is superior to other classification models in terms of performance index,in which F1 value and AUC value increase by 3.24%and 2.32%respectively.

关 键 词:代码异味 特征选择 CK度量 投票模型 集成学习 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象