面向可解释性的软件缺陷预测主动学习方法  

Interpretability-oriented active learning approach for software defect prediction

在线阅读下载全文

作  者:王越 李勇[1,2] 张文静[1] WANG Yue;LI Yong;ZHANG Wenjing(College of Computer Science and Technology,Xinjiang Normal University,Urumqi 830054,China;Key Laboratory of Safety-Critical Software of Ministry and Information Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)

机构地区:[1]新疆师范大学计算机科学技术学院,新疆乌鲁木齐830054 [2]南京航空航天大学高安全系统的软件开发与验证技术工信部重点实验室,江苏南京211106

出  处:《现代电子技术》2024年第20期101-108,共8页Modern Electronics Technique

基  金:新疆维吾尔自治区自然科学基金项目(2022D01A225);新疆维吾尔自治区重点研发计划项目(2022B01007-1)。

摘  要:针对软件缺陷预测中数据标注代价较高及深度学习模型缺乏可解释性的问题,提出一种面向可解释性的软件缺陷预测主动学习方法。首先,基于主动学习技术,通过样本选择策略从目标项目中筛选出不确定性高的样本进行专家标注,并将这些标注样本放入源项目中以训练预测器。其次,利用领域知识对选定样本进行扰动,构建局部数据集,并通过线性模型在该数据集上模拟数据选择策略的行为,以实现模型的可解释性。实验结果显示:该方法在数据标注方面的指标性能要优于传统的主动学习基准方法;同时,在可解释性方面,该方法的RMSE指标也均低于LIME、全局代理模型以及RuleFit,能较好地解释“黑盒”模型。该方法不仅可以有效提高软件缺陷数据的标注效率,还可以实现模型的可解释性。In allusion to the problems of high cost of data annotation and lack of interpretability of deep learning model in software defect prediction,an interpretability-oriented active learning approach for software defect prediction is proposed.Based on the active learning technology,samples with high uncertainty are filtered from the target project by means of sample selection strategy for expert annotation,and these annotated samples are put into the source project to train the predictor.The selected samples are perturbed by means of domain knowledge to construct a local dataset,and the behavior of the data selection strategy is simulated on this dataset by means of the linear model to achieve the interpretability of the model.The experimental results show that this approach has better performance than the traditional active learning benchmark approach in data annotation.Meanwhile,the RMSE metrics of the method are also lower than those of LIME,Global Agent Model,and RuleFit in terms of interpretability,which can better explain the black-box model.This approach can not only effectively improve the annotation efficiency of software defect data,but also achieve the interpretability of the model.

关 键 词:软件缺陷预测 主动学习 可解释性 数据标注 数据选择策略 深度学习 

分 类 号:TN919-34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象