面向高速网络流量的恶意镜像网站识别方法  被引量:5

IMM4HT: an identification method of malicious mirror website for high-speed network traffic

在线阅读下载全文

作  者:张蕾 张鹏[2] 孙伟 杨兴东 邢丽超 ZHANG Lei;ZHANG Peng;SUN Wei;YANG Xingdong;XING Lichao(School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China;School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China;School of Computer Science and Engineering,Beihang University,Beijing 100191,China)

机构地区:[1]中国科学院大学网络空间安全学院,北京100049 [2]中国科学院信息工程研究所,北京100093 [3]北京交通大学计算机与信息技术学院,北京100044 [4]北京航空航天大学计算机学院,北京100191

出  处:《通信学报》2019年第7期87-94,共8页Journal on Communications

基  金:国家重点研究发展计划基金资助项目(No.2016YFB0801300);国家自然科学基金资助项目(No.61602474,No.61602467,No.61702552)~~

摘  要:针对网络环境中造成危害的信息通过镜像网站进行传播从而绕过检查的问题,提出了面向高速网络流量的恶意镜像网站识别方法。首先,从流量中提取碎片化数据并且还原网页源码,同时加入标准化处理来提高识别准确率;然后,将网页源码分块,利用相似度散列算法对每个网页源码分块计算散列值,得到网页源码的相似度散列值,同时引入海明距离来计算网页源码之间的相似性;最后,截取网页快照,提取其 SIFT 特征点,通过聚类分析和映射处理得到网页快照的感知散列值,通过感知散列值计算网页相似性。在真实流量下的实验表明,所提方法的准确率为 93.42%,召回率为 90.20%,F 值为 0.92,处理时延为 20 μs。通过所提方法,在高速网络流量下可以有效地检测恶意镜像网页。Aiming at the problem that some information causing harm to the network environment was transmitted through the mirror website so as to bypass the detection, an identification method of malicious mirror website for high-speed network traffic was proposed. At first, fragmented data from the traffic was extracted, and the source code of the webpage was restored. Next, a standardized processing module was utilized to improve the accuracy. Additionally, the source code of the webpage was divided into blocks, and the hash value of each block was calculated by the simhash al- gorithm. Therefore, the simhash value of the webpage source codes was obtained, and the similarity between the webpage source codes was calculated by the Hamming distance. The page snapshot was then taken and SIFT feature points were extracted. The perceptual hash value was obtained by clustering analysis and mapping processing. Finally, the similarity of webpages was calculated by the perceptual hash values. Experiments under real traffic show that the accuracy of the method is 93.42%, the recall rate is 90.20%, the F value is 0.92, and the processing delay is 20 μs. Through the proposed method, malicious mirror website can be effectively detected in the high-speed network traffic environment.

关 键 词:恶意镜像网站 相似度散列算法 网页相似性 

分 类 号:TP309[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象