机构地区:[1]天津市先进网络重点实验室(天津大学),天津300050 [2]南通大学计算机科学与技术学院,江苏南通226019 [3]信息安全国家重点研究室(中国科学院信息工程研究所),北京100093 [4]School of Computer Science and Engineering, Nanyang Technological University 639798, Singapore
出 处:《软件学报》2019年第10期3071-3089,共19页Journal of Software
基 金:国家自然科学基金(61572349,61272106)~~
摘 要:软件测试是保障软件质量的常用方法,如何获得高覆盖率是测试中十分重要且具有挑战性的研究问题.模糊测试与符号执行作为两大主流测试技术已被广泛研究并应用到学术界与工业界中,这两种技术都具有一定的优缺点:模糊测试随机变异生成测试用例并动态执行程序,可以执行并覆盖到较深的分支,但其很难通过变异的方法生成覆盖到复杂条件分支的测试用例.而符号执行依赖约束求解器,可以生成覆盖复杂条件分支的测试用例,但在符号化执行过程中往往会出现状态爆炸问题,因此很难覆盖到较深的分支.有工作已经证明,将符号执行与模糊测试相结合可以获得比单独使用模糊测试或者符号执行更好的效果.分析符号执行与模糊测试的优缺点,提出了一种基于分支覆盖将两种方法结合的混合测试方法 Afleer,结合双方优点从而可以生成具有更高分支覆盖率的测试用例.具体来说,模糊测试(例如 AFL)为程序快速生成大量可以覆盖较深分支的测试用例,符号执行(例如 KLEE)基于模糊测试的覆盖信息进行搜索,仅为未覆盖到的分支生成测试用例.为了验证 Afleer 的有效性,选取标准程序集LAVA-M 以及实际项目 oSIP 作为评测对象,以漏洞检测能力以及覆盖能力作为评测指标.实验结果表明:(1)在漏洞检测能力上,Afleer 总共可以发现 755 个漏洞,而 AFL 仅发现 1 个;(2)在覆盖能力上,Afleer 在标准程序集上以及实际项目中都有不同程度的提升.其中,在 oSIP 中,Afleer 比 AFL 在分支覆盖率上提高 2.4 倍,在路径覆盖率上提升 6.1倍.除此之外,Afleer 在 oSIP 中还检测出一个新的漏洞.Software testing is a common way to guarantee software quality. How to achieve high coverage is a very important and challenging goal in testing. Fuzz testing and symbolic execution, as two mainstream testing techniques, have been widely studied and applied to academia and industry, both technologies have certain advantages and limitations. Fuzz testing can execute and cover deeper branches by randomly mutating test cases and dynamically executing programs. However, it is difficult to generate test cases that can cover complex conditional branches by random mutation. Symbolic execution can cover complex conditional branches with SMT solvers, but it is difficult to cover deeper branches due to state explosion during symbolic execution. Current works have shown that hybrid testing involving fuzzing and symbolic execution can archive better performance than fuzzing or symbolic execution. By analyzing the advantages and disadvantages in fuzzing and symbolic execution, this study proposes a branch coverage-based hybrid testing approach that combines the two methods with each other to achieve better test cases with high branch coverage. Specifically, fuzz testing (e.g., AFL) quickly generates a large number of test cases that can cover deeper branches, and symbolic execution (e.g., KLEE) performs a search based on the coverage of fuzz testing, and generating test cases for uncovered branches. To evaluate the effectiveness of Afleer, the study selects the standard benchmark LAVA-M and one real project oSIP as the evaluation object, and uses bug detection and coverage as the evaluation measures. The experimental results show that: 1) For bug discovery, Afleer found 755 bugs while AFL only found 1;2) For coverage, Afleer achieved some improvement on benchmarks and real project. In the project oSIP, Afleer increases the branch coverage by 2.4 times and the path coverage by 6.1 times. In addition, Afleer found a new bug in oSIP.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...