检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘家豪 江贺[1] LIU Jiahao;JIANG He(School of Software,Dalian University of Technology,Dalian,Liaoning 116600,China)
出 处:《计算机科学》2024年第12期53-62,共10页Computer Science
摘 要:PDF文件是一种被广泛应用的重要文档格式。由于PDF文件的复杂性,PDF相关的应用程序中存在的缺陷可能会导致严重后果,例如遭遇恶意攻击、信息错误呈现等。因此,针对PDF相关应用程序的测试成为当前研究的热点问题。目前最有效的方法是基于语法的模糊测试。然而,基于语法的模糊测试往往需要花费大量手工工作对复杂的语法规则进行总结和编写,严重阻碍了测试用例高效地自动化生成。深度学习技术为突破这一障碍提供了可行路径,但目前的方法生成的测试用例普遍质量较低,查找bug能力较差。进一步对其进行改进需要应对3个主要挑战,即数据集的筛选、测试用例覆盖率提升和测试用例大小增加两者间的平衡、测试用例的高效变异。因此,提出了一个基于深度学习的高效PDF应用程序模糊测试用例生成框架DeepGenFuzz,利用CNN,Seq2Seq和Transformer等模型,通过数据筛选、对象生成、对象附加、高效变异等步骤生成高质量PDF测试用例。在MuPDF等PDF应用程序上的评估表明,DeepGenFuzz生成的测试用例平均代码覆盖率明显高于Learn&Fuzz和IUST-DeepFuzz等目前最先进的工具,最高可达8.12%~61.03%;bug查找能力也远远优于Learn&Fuzz和IUST-DeepFuzz等最先进的工具,目前已经报告了在7个最流行的PDF应用程序中发现的31个未曾被报告的bug,其中25个已经得到确认或修复,涵盖了所有被测程序。PDF file is a widely used and important document format.Due to the complexity of PDF files,defects in PDF-related applications can lead to serious consequences such as malicious attacks and incorrect information rendering.Therefore,testing PDF-related applications has become a hot research topic.The most effective method currently is grammar-based fuzz testing,but it often requires a significant amount of manual work to summarize and write complex grammar rules,which seriously hinders the efficient automation of test case generation.Deep learning techniques provide a feasible solution to this challenge.However,the quality of test cases generated by current methods is generally low,and the ability to find bugs is poor.To further improve this,three main challenges need to be addressed:data set filtering,balancing test case coverage improvement and test case size increase,and efficient mutation of test cases.Therefore,this paper proposes a deep learning-based efficient PDF application fuzz test case generation framework called DeepGenFuzz.It utilizes models such as CNN,Seq2Seq,and Transformer to generate high-quality PDF test cases through steps including data filtering,object generation,object appending,and efficient mutation.Evaluations on PDF applications like MuPDF show that DeepGenFuzz generates test cases with significantly higher average code coverage compared to state-of-the-art tools like Learn&Fuzz and IUST-DeepFuzz,reaching up to 8.12%~61.03%.Its bug-finding capabilities are also far superior to those of Learn&Fuzz and IUST-DeepFuzz.Currently,31 previously unreported bugs have been discovered in the seven most popular PDF applications,among which 25 have been confirmed or fixed,covering all tested programs.
关 键 词:PDF应用程序 深度学习 模糊测试 测试用例 代码覆盖率
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.144.226.114