检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张艳梅[1,2] 植胜林 姜淑娟[1,2] 袁冠[1,2] ZHANG Yan-mei;ZHI Sheng-lin;JIANG Shu-juan;YUAN Guan(Mine Digitization Engineering Research Center of the Ministry of Education,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China;School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China;KeHua Data CO.,LTD,Shenzhen,Guangdong 518055,China)
机构地区:[1]中国矿业大学矿山数字化工程研究中心,江苏徐州221116 [2]中国矿业大学计算机科学与技术学院,江苏徐州221116 [3]科华数据股份有限公司,广东深圳518055
出 处:《电子学报》2023年第8期2076-2087,共12页Acta Electronica Sinica
基 金:国家自然科学基金(No.61673384,No.71774159);中国博士后科学基金特别资助(No.2021T140707)。
摘 要:本文提出一种类不平衡对软件缺陷预测模型稳定性和预测性能的影响分析方法 .首先,使用欠采样方法将原数据集构造成一组不平衡率小于原数据集本身不平衡率的新数据集.其中,在构造数据集时使用固定种子,保证同一个数据集构造的同一个不平衡率的数据集中的数据相同,以减少每次运行结果的随机性.其次,以MCC值作为预测模型的性能评价指标,将每次产生的新数据集放入模型中的分类算法进行训练预测评价,获得当前数据集不同不平衡率下的MCC值,并提出稳定性评价指标.实验结果表明:与AUC相比,MCC更适合作为类不平衡情况下软件缺陷预测模型稳定性的评价指标;对于软件缺陷预测性能稳定性,代价敏感模型表现优于集成模型.The paper proposes a method for analyzing the influence of class imbalance on software defect prediction model stability and prediction performance.Firstly,the original data set is constructed into a set of new data sets whose un⁃balance rate is less than the original data set's unbalance rate by using the undersampling method.Where,fixed seeds are used in the construction of the data set to ensure that the data in the same unbalanced rate data set constructed by the same data set is the same,so as to reduce the randomness of the results of each run.Secondly,the MCC value is taken as the per⁃formance evaluation indicator of the prediction model,and the new data set generated each time is put into the classification algorithm of the model for training and prediction evaluation,so as to obtain the MCC value at different unbalanced rate for the current data set.We also propose a performance stability evaluation indicator.The experimental results show that,MCC is more suitable as the stability evaluation indicator of software defect prediction model under the condition of class imbalance compared with AUC.For the stability of software defect prediction performance,the cost sensitive model per⁃forms better than the ensemble model.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222