机构地区:[1]南京航空航天大学计算机科学与技术学院,南京211106
出 处:《计算机科学》2024年第7期1-9,共9页Computer Science
基 金:国家自然科学基金(62002161,62272225);南京航空航天大学前瞻布局科研专项资金。
摘 要:为了加快开发人员定位软件缺陷,研究人员提出了一系列基于文本检索的缺陷定位技术,自动为用户所提交的缺陷报告推荐可疑的代码文件。由于用户的专业知识不同,编写的缺陷报告质量不一致,因此某些低质量的缺陷报告无法被成功定位。对低质量的缺陷报告进行重构从而改进其定位效果,是常见的解决方案。现有基于查询扩展和查询缩减的主流重构方法,容易出现重构前后查询主题不一致或所依赖伪相关库质量差导致重构质量低的问题。对此,提出了一种基于主题一致性保持和伪相关反馈库扩展的缺陷报告重构方法,由主题一致性保持的查询缩减阶段和伪相关反馈库扩展的查询扩展阶段两部分组成。查询缩减阶段将缺陷报告的概要问题描述和从问题描述文本中提取的关键词合并来解决主题不一致性问题;查询扩展阶段综合使用多种定位工具(即Lucene, BugLocator和Blizzard)来获得伪相关反馈库,并从中提取查询扩展关键词,以解决现有伪相关反馈库质量差导致的重构质量低的问题;最后将查询缩减和扩展阶段的输出合并得到重构后的查询。通过在6个Java项目上进行实验发现,对于使用现有缺陷定位方法无法在TOP 10可疑推荐文件中定位的低质量缺陷报告,使用所提重构方法后,能定位其中21%~39%的缺陷即Accuracy@10,MRR@10为10%~16%。对比现有重构技术,所提重构方法在Accuracy@10和MRR@10两个指标上分别可以提升7%~32%和2%~13%。To enhance the speed of locating software bugs for developers,a set of bug location techniques based on text retrieval has been proposed.These techniques aim to automatically recommend potentially suspicious code files associated with bug reports submitted by users.However,due to varying levels of professional expertise among users,the quality of bug reports tends to be inconsistent.As a result,some low-quality bug reports cannot be successfully located.To improve the quality of those bug reports,it is common to refactor the bug reports.Existing mainstream methods for reformulation,which involve query extension and query reduction,often face issues such as inconsistent query topics before and after reformulation or the utilization of poor-quality pseudo-correlation libraries.To address this problem,this paper proposes a bug report reformulation method that focuses on maintaining topic consistency and extending pseudo-correlation feedback libraries.This method consists of two parts:the query reduction stage,which aims to maintain topic consistency through combining a concise problem description with keywords extracted from the text,and the query expansion stage,which involves using various locating tools(Lucene,BugLocator,and Blizzard)to comprehensively obtain a pseudo-correlation feedback library.From this library,additional keywords for query expansion are extracted to address the issue of low reformulation quality caused by the inadequacy of the existing pseudo-correlation feedback library.Ultimately,the outputs of the query reduction and expansion stages are combined to form the reformulated query.Through experiments conducted on six Java projects,it is discovered that for low-quality bug reports that could not be identified among the top 10 recommended files using the existing bug location method,21%~39%of them can be located using the proposed reformulation method,i.e.,Accuracy@10 and MRR@10 is 10%~16%.Compared with existing reformulation techniques,the Accuracy@10 and MRR@10 of the proposed reformulation me
关 键 词:缺陷定位 查询重构 查询缩减 查询扩展 伪相关反馈库 缺陷报告质量
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...