检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张际灿 姚锟彬 薛磊 王晨 聂黎明 ZHANG Ji-can;YAO Kun-bin;XUE Lei;WANG Chen;NIE Li-ming(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074,China;Shenzhen Campus of Sun Yat-sen University,Shenzhen 518107,China;Zhejiang Sci-Tech University,Hangzhou 310018,China;Nanyang Technological University,Singapore 699010,Singapore)
机构地区:[1]武汉邮电科学研究院,湖北武汉430074 [2]中山大学·深圳,广东深圳518107 [3]浙江理工大学,浙江杭州310018 [4]南洋理工大学,新加坡699010
出 处:《计算机技术与发展》2024年第7期62-68,共7页Computer Technology and Development
基 金:国家自然科学基金(62002306,61972359)。
摘 要:二进制代码相似性检测(Binary Code Similarity Detection,BCSD)技术在逆向工程、漏洞检测、恶意软件检测、软件抄袭以及补丁分析等学术应用领域发挥着重要作用。大多数研究已经集中在对二进制函数进行控制流嵌入和基于自然语言处理(Natural Language Processing,NLP)技术的底层代码嵌入技术的研究之中。然而,需要指出的是,函数在实际运行中不仅包含控制流信息,还包括数据流语义信息。因此,如何全面抽象函数的语义特征显得尤为关键。为此,该文提出了BS-DD模型,这是一个融合了控制流和数据依赖关系的二进制函数相似性判断框架。通过模拟执行二进制代码的方法来提取语义信息,并运用化简算法构建数据依赖关系图。最后,借助图神经网络进行相似性判别。对来自开源社区的7个广泛使用的软件进行了不同组合的编译,并在此基础上设计了3个不同的任务场景以及真实的漏洞检测实验,用以比较BS-DD方法与最新基于数据流的BCSD方法的性能。实验结果显示,该模型在召回率和MRR(Mean Reciprocal Rank)分数方面取得了显著的提高。在真实环境的漏洞检测中,该模型也始终优于其他方法。Binary Code Similarity Detection(BCSD)technology plays a pivotal role in various academic applications such as reverse engineering,vulnerability detection,malware analysis,software plagiarism,and patch analysis.Most research efforts have predominantly focused on control-flow embedding of binary functions and the exploration of underlying code embedding techniques utilizing Natural Language Processing(NLP)technology.However,it is worth noting that functions encompass not only control-flow information but also data-flow semantic information during their actual execution.Consequently,achieving a comprehensive abstraction of the semantic features of functions becomes crucial.In light of this,we introduce BS-DD,a framework for assessing binary function similarity that integrates both control flow and data dependency relationships.We extract semantic information by simulating the execution of binary code and employ a simplification algorithm to construct a data dependency graph.Finally,we leverage graph neural networks for similarity assessment.We compile seven widely used software packages from the open-source community in various combinations and design three distinct task scenarios,including real-world vulnerability detection experiments,to compare the performance of the BS-DD approach with the latest data-flow-based BCSD methods.Experimental results demonstrate significant improvements in recall and Mean Reciprocal Rank(MRR)scores for such model.In real-world vulnerability detection scenarios,such model consistently outperforms other methods.
关 键 词:二进制 数据依赖 相似性检测 图神经网络 语义信息 漏洞检测
分 类 号:TP302[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.131.95.159