基于IR2Vec模型的跨架构密码算法识别  

Cross-architecture Cryptographic Algorithm Recognition Based on IR2Vec

在线阅读下载全文

作  者:赵晨霞 舒辉[2] 沙子涵 ZHAO Chenxia;SHU Hui;SHA Zihan(School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou 450001,China;State Key Laboratory of Mathematical Engineering and Advanced Computing,Information Engineering University,Zhengzhou450001,China)

机构地区:[1]郑州大学网络空间安全学院,郑州450001 [2]信息工程大学数学工程与先进计算国家重点实验室,郑州450001

出  处:《计算机科学》2023年第S01期720-726,共7页Computer Science

基  金:国家重点研发计划(2019QY1305)。

摘  要:在信息安全领域,加密技术被用来保障信息的安全性,在可执行文件中识别密码算法对于保护信息安全有着重要意义。现有密码算法识别技术大多只能针对单一架构,在跨架构场景下识别能力较差,因此,提出了IR2Vec模型,着力解决跨架构下的密码算法识别问题。该模型首先基于LLVM衔接不同的前端和后端的特性来解决跨架构的问题,利用LLVM-RetDec将可执行文件反编译成中间表示语言,然后改进PV-DM模型将中间表示语言语义向量化,通过求取向量的余弦距离来判断语义相似性。收集多种密码算法来建立密码算法库,将待检测目标可执行文件分别与密码算法库中的文件进行一一对比,取相似度最高的为识别结果。实验结果表明,该技术能够有效识别出可执行文件中的密码算法,该模型可同时支持X86,ARM和MIPS 3种架构,Clang和GCC两种编译器,以及O0,O1,O2和O3这4种优化选项的二进制文件交叉识别。In the field of information security,encryption technology is used to ensure the security of information.Identifying cryptographic algorithm in executable file is of great significance to protect information security.Most of the existing cryptographic algorithm recognition technologies can only target a single architecture and have poor recognition ability in cross-architecture scenarios.Therefore,this paper proposes IR2Vec model to solve the problem of cryptographic algorithm recognition in cross-architecture.Firstly,the model solves the cross-architecture problem based on the characteristics of LLVM connecting different front-end and back-end.The executable file is decompiled into the intermediate representation language by LLVM-RetDec,and then the PV-DM model is improved to quantify the semantics of the intermediate representation language,and the semantic similarity is judged by calculating the cosine distance of the vector.Collecting a variety of cryptographic algorithms to establish the cryptographic algorithm library,comparing the executable files of the target to be detected with the files in the cryptographic algorithm library one by one,and taking the one with the highest similarity as the recognition result.Experimental results show that the technology can effectively identify the cryptographic algorithm in the executable file.The model can support the cross recognition of binary files of X86,ARM and MIPS,Clang and GCC compilers and O0,O1,O2 and O3 optimization options.

关 键 词:相似性识别 跨架构 密码算法 LLVM 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象