检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:涂吉屏 钱晔 王炜[1,2] 范道远 张涵宇 TU Jiping;QIAN Ye;WANG Wei;FAN Daoyuan;ZHANG Hanyu(School of Software,Yunnan University,Kunming 650500,China;Key Laboratory for Software Engineering of Yunnan Province,Kunming 650500,China;School of Big Data(Information Engineering),Yunnan Agricultural University,Kunming 650201,China)
机构地区:[1]云南大学软件学院,昆明650500 [2]云南省软件工程重点实验室,昆明650500 [3]云南农业大学大数据学院(信息工程学院),昆明650201
出 处:《计算机科学与探索》2020年第2期215-235,共21页Journal of Frontiers of Computer Science and Technology
基 金:国家自然科学基金No.61462092~~
摘 要:软件故障预测中若采用大量度量指标建立预测模型,可能因其中含有无关特征使预测模型性能受到不良影响,故障预测中的特征选择步骤选取一定维度的部分故障数据建立预测模型来提高模型性能,以达到压缩特征维度,提高模型预测精度,降低预测模型复杂度,节约计算资源的目的。传统特征排序方法仅评估单个特征对类标的影响,建立的预测模型有效性较低;特征子集选择方法需搜索所有特征子集,耗费计算资源且所选特征维数较高。针对以上问题,提出一种基于拓展贝叶斯信息准则的特征选择方法(EBIC-FS),该方法对数据进行线性回归,并计算出残差平方和较小且数据维数较少的特征模型。在公开数据集M&R及Promise上进行实验,结果表明该方法能有效压缩特征维度,且预测模型性能与5种基线方法相比有较大提升。Using a large number of metrics to establish a software defect prediction model may affect the performance of the prediction model because of unrelated metrics.Feature selection in defect prediction selects a certain dimension of partial defect data to build prediction model,which can achieve the aim of improving the performance of the model,compressing feature dimensions,improving the accuracy of the prediction model,reducing the complexity of the prediction model,and saving computing resources.The traditional feature ranking methods only evaluate the influence of a single feature on the class label,which has low effectiveness;feature subset selection methods need to evaluate all feature subsets,which consumes computing resources,meanwhile,feature subset selection methods tend to select many features.Therefore,this paper proposes a feature selection method based on extended Bayesian information criterion(EBIC-FS),which can make linear regression of the data and select the feature subset with the lowest sum of residuals and less feature dimensions.Experiments are conducted on benchmark datasets M&R and Promise.The results show that the method can compress the dimension of features effectively,and the performance of the prediction model is greatly improved compared with 5 baseline methods.
关 键 词:软件故障预测 特征选择 拓展贝叶斯信息准则 最佳特征子集
分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222