面向C++商业软件二进制代码中的类信息恢复技术  

Class Information Recovery Technology for COTS C++Binary

在线阅读下载全文

作  者:杨晋 龚晓锐[1,2] 吴炜 张伯伦[1,2] YANG Jin;GONG Xiaorui;WU Wei;ZHANG Bolun(Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China;School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院信息工程研究所,北京100093 [2]中国科学院大学网络空间安全学院,北京100049

出  处:《信息安全学报》2024年第3期138-156,共19页Journal of Cyber Security

基  金:北京市科技计划网络空间攻防特殊技能人才培养及支撑平台建设课题(No.Z181100002718002)资助。

摘  要:采用C++编写的软件一直是二进制逆向分析中的高难度挑战,二进制代码中不再保留C++中的类及其继承信息,尤其是正式发布的软件缺省开启编译优化,导致残留的信息也被大幅削减,使得商业软件(Commercial-Off-The-Shelf,COTS)的C++二进制逆向分析尤其困难。当前已有的研究工作一是没有充分考虑编译优化,导致编译优化后类及其继承关系的识别率很低,难以识别虚继承等复杂的类间关系;二是识别算法执行效率低,无法满足大型软件的逆向分析。本文围绕编译优化下的C++二进制代码中类及其继承关系的识别技术开展研究,在三个方面做出了改进。第一,利用过程间静态污点分析从C++二进制文件中提取对象的内存布局,有效抵抗编译优化的影响(构造函数内联);第二,引入了四种启发式方法,可从编译优化后的C++二进制文件中恢复丢失的信息;第三,研发了一种自适应CFG(控制流图)生成算法,在极小损失的情况下大幅度提高分析的效率。在此基础上实现了一个原型系统RECLASSIFY,它可以从C++二进制代码中有效识别多态类和类继承关系(包括虚继承)。实验表明,在MSVCABI和ItaniumABI下,RECLASSIFY均能在较短时间内从优化后二进制文件中识别出大多数多态类、恢复类关系。在由15个真实软件中的C++二进制文件组成的数据集中(O2编译优化),RECLASSIFY在MSVC ABI下恢复多态类的平均召回率为84.36%,而之前最先进的解决方案OOAnalyzer恢复多态类的平均召回率仅为33.76%。除此之外,与OOAnalyzer相比,RECLASSIFY的分析效率提高了三个数量级。Software written in C++has always been a difficult challenge in binary reverse analysis.Binary code no longer retains the classes and their information in C++,especially Commercial-Off-The-Shelf(COTS)enables compiler optimi-zation by default,resulting in significant reduction of residual information.It makes COTS C++binary reverse analysis particularly difficult.At present,the existing research work does not fully consider compilation optimization,resulting in a low recognition rate on recovering classes and class relationships under compiler optimization,and it is difficult to iden-tify complex relationships such as virtual inheritance.Second,the recognition algorithm has low efficiency and cannot meet the reverse analysis of large-scale software.This paper conducts research on the identification technology of classes and their inheritance in C++binary under compiler optimization,and makes achievements in three aspects.First,using the inter-procedural static taint analysis to extract the object memory layout from the C++binary,effectively resisting the impact of compiler optimization(inline constructors);second,introducing four heuristic methods,which can recover lost information in C++binary files;third,an adaptive CFG(control flow graph)generation algorithm has been developed to greatly improve the efficiency with mini-mal loss.On this basis,a prototype system RECLASSIFY is implemented,which can effectively identify polymorphic classes and class relationships(including virtual inheritance)from C++binary.Experiments show that under both MSVC ABI and Itanium ABI,RECLASSIFY can identify most polymorphic class and recovery class relationships from the optimized binary in a short time.In a data set composed of 15 C++binaries in real software(O2 compiler optimization),the average recall rate of RECLASSIFY recovering polymorphic classes under MSVC ABI is 84.36%,while the average recall rate of most advanced solution OOAnalyzer is only 33.76%.In addition,compared with OOAnalyzer,the analysis efficiency of RECLASSIFY i

关 键 词:二进制分析 类继承关系恢复 静态污点分析 自适应CFG生成算法 

分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象