检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:于光喜 张棪[1,2] 崔华俊 杨兴华 李杨 刘畅[1,2] YU Guangxi;ZHANG Yan;CUI Huajun;YANG Xinghua;LI Yang;LIU Chang(Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China;School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China)
机构地区:[1]中国科学院信息工程研究所,北京100093 [2]中国科学院大学网络空间安全学院,北京100049
出 处:《信息安全学报》2020年第3期35-47,共13页Journal of Cyber Security
基 金:中国科学院信息工程研究所创新科研项目(No.J810091105);引进优秀青年人才项目(No.Y6Z0011105)资助。
摘 要:僵尸网络广泛采用域名生成算法(Domain Generation Algorithm,DGA)生成大量的随机域名来躲避检测。针对僵尸网络DGA域名问题,本文设计实现了一种DGA域名检测系统。首先使用基于随机森林算法的轻量级分类分析检测模块,通过分析域名字符特征区分正常域名与疑似恶意域名,满足现网实际应用中快速检测的要求;然后使用基于X-means算法的聚类分析检测模块,在分类分析检测的基础上,根据DGA域名的字符相似性和查询行为相似性,通过聚类和集合分析方法对疑似恶意域名进一步检测,降低系统误检率。通过部署基于Spark的检测系统对某运营商现网真实DNS日志数据进行连续20天的处理和分析,检测系统平均每天挖掘出约250万DGA域名,经过正则匹配分析,其中约55%属于5类已知的DGA;在前两个实验日,共发现13,000个已知DGA域名分属于3个DGA类别。实验结果表明检测系统可有效检测出多种DGA域名,此外,检测系统也可满足现网实际应用中快速检测的要求。To avoid detection,botnets usually use domain generation algorithms(DGAs)to generate a large number of random domain names.In this paper,we designed and implemented a DGA domain names detection system.By using the features of domain name character,we first designed a classification module,which is a random forest-based and a lightweight detection module,aiming to distinguish suspicious domain names from normal ones and meet demand of fast detection in real network.Then based on the results of classification,we designed an X-means clustering module,which uses a clustering and set analysis detection method to analyze features of query behaviors and domain name characters,aiming to further analyze suspicious domain names and reduce the false positive rate.This system was implemented by the Spark.By processing and analyzing the real ISP network DNS log datasets over 20 days,this system detected about 2.5 million DGA domain names on average every day.After matching regex expressions,we found that about 55%of them belonging to 5 known DGA families were matched.And more than 13,000 regex matched domain names belonging to 3 DGA families hit the known DGA domain names in first two experimental days.Overall,experiment results show that this system can detect multiple DGA domain names effectively.In addition,this system can also meet the demand of fast detection in real network.
关 键 词:域名生成算法 机器学习 字符分析 访问行为分析 分布式处理
分 类 号:TP393.0[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3