稳健边界强化GMM-SMOTE软件缺陷检测方法  被引量:5

Robust Boundary-Enhanced GMM-SMOTE Software Defect Detection Method

在线阅读下载全文

作  者:罗森林[1] 苏霞 潘丽敏[1] LUO Senlin;SU Xia;PAN Limin(School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China)

机构地区:[1]北京理工大学信息与电子学院,北京100081

出  处:《北京理工大学学报》2021年第3期303-310,共8页Transactions of Beijing Institute of Technology

基  金:国家“十三五”科技支撑计划项目(SQ2018YFC200004)。

摘  要:基于软件大数据的自动化缺陷检测模型已成为缺陷发现的重要工具.针对软件大数据中,被准确标定的缺陷样本稀少,且漏标、误标率高,导致现有机器学习数据平衡优化方法易使噪声加剧、分类边界模糊等问题,提出一种稳健边界强化GMM-SMOTE软件缺陷检测方法.该方法利用高斯混合聚类将软件集合划分为多簇,基于簇内类别比进行可靠样本筛选并且通过后验概率实现边界识别,用以指导完成加权数据平衡,最后利用平衡优化数据构建软件缺陷检测模型.在NASA多个公开数据集上的实验结果表明,GMM-SMOTE可实现噪声抑制、边界强化的数据平衡,有效提高了软件缺陷识别效果,实际应用价值大.Software defects are bugs that can disrupt the normal operation of the system or software,the cost of detection and positioning for software defects is high.Automatic defect detection model based on software data have become an important tool for defect discovery.Defective samples that are accurately labeled is rare,and the rate of missing labels and mislabeling is high,which leads the existing data balance optimization methods to exacerbate noise and blur boundaries of classification.To solve this problem,a robust boundary-enhanced GMM-SMOTE software defect detection method was proposed.This method was arranged to use Gaussian mixture clustering to divide the software data set into multiple clusters,to make reliable sample selection based on intra-cluster category ratio,and to implement boundary recognition based on posterior probability,to guide the completion of the weighted data balance,and finally to build a software defect detection model using balanced optimization data.Experimental results on multiple NASA public data sets show that GMM-SMOTE can achieve data balance of noise suppression and boundary enhancement,effectively improve the effect of software defect detection,possessing great practical value.

关 键 词:软件缺陷检测 数据不平衡 过采样 高斯混合模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象