一种抗混淆的大规模Android应用相似性检测方法  被引量:9

An Anti-Obfuscation Method for Detecting Similarity Among Android Applications in Large Scale

在线阅读下载全文

作  者:焦四辈[1] 应凌云[1] 杨轶[1] 程瑶[1] 苏璞睿[1] 冯登国[1] 

机构地区:[1]中国科学院软件研究所可信计算与信息保障实验室,北京100190

出  处:《计算机研究与发展》2014年第7期1446-1457,共12页Journal of Computer Research and Development

基  金:国家"九七三"重点基础研究发展计划基金项目(2012CB315804);国家自然科学基金重大研究计划项目(91118006);国家自然科学基金项目(61073179);北京市自然科学基金项目(4122086)

摘  要:随着代码混淆、加壳技术的应用,基于行为特征的Android应用相似性检测受到的影响愈加明显.提出了一种抗混淆的大规模Android应用相似性检测方法,通过提取应用内特定文件的内容特征计算应用相似性,该方法不受代码混淆的影响,且能有效抵抗文件混淆带来的干扰.对5.9万个应用内的文件类型进行统计,选取具有普遍性、代表性和可度量性的图片文件、音频文件和布局文件作为特征文件.针对3种特征文件的特点,提出了不同内容特征提取方法和相似度计算方法,并通过学习对其相似度赋予权重,进一步提高应用相似性检测的准确性.使用正版应用和已知恶意应用作为标准,对5.9万个应用进行相似性检测实验,结果显示基于文件内容的相似性检测可以准确识别重打包应用和含有已知恶意代码的应用,并且在效率和准确性上均优于现有方案.Code obfuscation exerts a huge impact on similarity detection among Android applications based on behavior characteristics. In order to deal with the situation, we propose a novel way of similarity detection among Android applications based on file content characteristics, which computes the similarity of file content features and can be applied to large-scale scenario in real world. Our method is not subject to code obfuscation or file obfuscation. We choose to utilize the characteristics of image, audio and layout files which are shown in our statistics as the most representative features in Android applications. Meanwhile, different weights are given to these features through machine learning, which further enhances the accuracy of our method. In addition, we implement a prototype system and particularly optimize each step to speed up the calculation, making our system suitable for large-scale scenario and give a good calculation performance. The experiments dataset contains 59 000 applications. And for both legitimate application and malware applications, our system successfully detects those repackaged pirate applications and those with the similar malicious component, which prove the effectiveness of our method. The experiment results demonstrate that similarity detection based on file content characteristics could resist the file obfuscation and give better performance in both accuracy and efficiency.

关 键 词:文件内容特征 模糊散列 感知特征 安卓 应用相似性 抗混淆 

分 类 号:TP309[自动化与计算机技术—计算机系统结构] TP181[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象