基于混合词向量深度学习模型的DGA域名检测方法  被引量:22

A DGA Domain Name Detection Method Based on Deep Learning Models with Mixed Word Embedding

在线阅读下载全文

作  者:杜鹏 丁世飞 Du Peng;Ding Shifei(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116;Engineering Research Center of Mine Digitization(China University of Mining and Technology),Ministry of Education,Xuzhou,Jiangsu 221116)

机构地区:[1]中国矿业大学计算机科学与技术学院,江苏徐州221116 [2]矿山数字化教育部工程研究中心(中国矿业大学),江苏徐州221116

出  处:《计算机研究与发展》2020年第2期433-446,共14页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61672522,61976216,61379101);江苏省研究生科研创新计划项目(KYCX19_2196);中国矿业大学研究生科研创新计划项目(KYCX19_2196)~~

摘  要:域名生成算法(domain generation algorithm,DGA)是域名检测中防范僵尸网络攻击的重要手段之一,对于生成威胁情报、阻断僵尸网络命令与控制流量、保障网络安全有重要的实际意义.近年来,DGA域名检测技术从依靠手工提取特征发展到自动提取特征的基于深度学习模型的方法,在DGA域名检测任务中取得了较大的进展.但对于不同僵尸网络家族的DGA域名的多分类任务,由于家族种类多,且各家族域名数据存在不平衡性,因此许多已有的深度学习模型在DGA域名的多分类任务上仍有提高空间.针对以上挑战,设计了基于字符和双字母组级别的混合词向量,以提高域名字符串的信息利用度,并设计了基于混合词向量方法的深度学习模型.最后设计了包含多种对比模型的实验,对混合词向量的有效性进行验证.实验结果表明基于混合词向量的深度学习模型在DGA域名检测与分类任务中相比只基于字符级词向量的模型有更好的分类性能,特别是在小样本的DGA域名类别上的分类性能更优,证明了该模型的有效性.DGA domain name detection plays a key role in preventing botnet attacks.It is practically significant in generating threat intelligence,blocking botnet command and control traffic,and maintaining cyber security.In recent years,DGA domain name detection algorithms have made great progress,from the methods using manually-crafted features to the automatically extracting features generated by deep learning methods.Multiple studies have indicated that deep learning methods perform better in DGA detection.However,DGA families are various and domain name data is imbalanced in the multi-class classification of different DGA families.Many existing deep learning models can still be improved.To solve the above problems,a mixed word embedding method is designed,based on character level embedding and bigram level embedding,to improve the information utilization of domain names.The paper also designs a deep learning model using the mixed word embedding method.At the end of the paper,an experiment with multiple comparison models is conducted to test the model.The experimental results show that the model based on the mixed word embedding achieves better performance in DGA domain name detection and multi-class classification tasks compared with the models based on character level embedding,especially in the small DGA families with few samples.The results show the proposed approach is effective.

关 键 词:域名生成算法 混合词向量 深度学习 卷积神经网络 长短期记忆网络 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象