基于N-Gram的计算机病毒特征码自动提取的改进方法  被引量:8

Improved Method of Computer Virus Signature Automatic Extraction Based on N-Gram

在线阅读下载全文

作  者:杨燕 蒋国平[2] 

机构地区:[1]南京邮电大学计算机学院,南京210003 [2]南京邮电大学自动化学院,南京210003

出  处:《计算机科学》2017年第B11期338-341,361,共5页Computer Science

摘  要:随着计算机技术的发展和普及,计算机病毒带来的危害日趋严重。传统N-Gram算法难以提取不同长度的特征,导致有效特征缺失,并产生庞大的特征集合,造成空间的浪费。针对这些问题,提出一种改进的基于N-Gram的特征码自动提取方法。该方法在原有N-Gram特征提取算法的基础上引入变长N-Gram特征,提取不同长度的有效特征,生成不定长病毒特征码。综合考虑特征频率的相关性,利用特征浓度对N-Gram特征进行有向筛选,生成数据字典,节省存储空间。实验结果表明,与单纯使用定长N-Gram的算法相比,该方法能有效降低特征码自动提取的误报率。Wi th the rapid development of computer technology, security threats brought by computer virus have become more and more serious. The tradit ional N-Gram algorithm is difficult to capture bytes of dif ferent length,leading to the lack of effective signature and the geheration of huge signature sets, and creating a waste of storage space. Instead of using f ixed-length N-Gram feature that the tradit ional way dose, an improved computer virus signature automatic ex-tract ion algorithm based on variable-length N-Gram was proposed to solve these problems. It extracts the effective sig-nature to generate variable-length virus signature. Taking the correlation of signature frequency into account, the algo-ri thm uses signature concentration to extract the N-Gram feature of malware samples and generates a data dictionary to save the storage space. In the experiment results, compared with the tradit ional algorithm which uses f ixed-length N- Gram feature, the proposed method can effectively decrease the false rate of signature extraction.

关 键 词:N-GRAM 病毒特征码 特征浓度 数据字典 

分 类 号:TP309.5[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象