Fast Cross-Platform Binary Code Similarity Detection Framework Based on CFGs Taking Advantage of NLP and Inductive GNN  

在线阅读下载全文

作  者:Jinxue PENG Yong WANG Jingfeng XUE Zhenyan LIU 

机构地区:[1]School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China

出  处:《Chinese Journal of Electronics》2024年第1期128-138,共11页电子学报(英文版)

基  金:supported by the National Natural Science Foundation of China(Grant.No.62172042);the Major Scientific and Technological Innovation Projects of Shandong Province(Grant No.2020CXGC010116)。

摘  要:Cross-platform binary code similarity detection aims at detecting whether two or more pieces of binary code are similar or not.Existing approaches that combine control flow graphs(CFGs)-based function representation and graph convolutional network(GCN)-based similarity analysis are the best-performing ones.Due to a large amount of convolutional computation and the loss of structural information,the use of convolution networks will inevitably bring problems such as high overhead and sometimes inaccuracy.To address these issues,we propose a fast cross-platform binary code similarity detection framework that takes advantage of natural language processing(NLP)and inductive graph neural network(GNN)for basic blocks embedding and function representation respectively by simulating extracting structural features and temporal features.GNN’s node-centric and small batch is a suitable training way for large CFGs,it can greatly reduce computational overhead.Various NLP basic block embedding models and GNNs are evaluated.Experimental results show that the scheme with long short term memory(LSTM)for basic blocks embedding and inductive learning-based Graph SAGE(GAE)for function representation outperforms the state-of-the-art works.In our framework,we can take only 45%overhead.Improve efficiency significantly with a small performance trade-off.

关 键 词:Control flow graph Natural language processing Inductive graph neural network Binary code similarity detection 

分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论] TP309[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象