基于循环神经网络和生成式对抗网络的口令猜测模型研究  被引量:17

Password Guessing Based on Recursive Neural Networks and Generative Adversarial Networks

在线阅读下载全文

作  者:汪定 邹云开 陶义 王彬[3] WANG Ding;ZOU Yun-Kai;TAO Yi;WANG Bin(College of Cyber Science,Nankai University,Tianjin 300350;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University),Tianjin 300350;School of Electronics Engineering and Computer Science,Peking University,Beijing 100871)

机构地区:[1]南开大学网络空间安全学院,天津300350 [2]天津市网络与数据安全技术重点实验室(南开大学),天津300350 [3]北京大学信息科学技术学院,北京100871

出  处:《计算机学报》2021年第8期1519-1534,共16页Chinese Journal of Computers

基  金:国家自然科学基金(61802006);国家重点研发计划“前沿科技创新”重点专项课题资助.

摘  要:深度学习技术的进展为提高口令猜测效率提供了潜在的新途径.目前,已有研究将循环神经网络(Recursive Neural Network,RNN)、生成式对抗网络(Generative Adversarial Network,GAN)等深度学习模型运用于设计口令猜测模型.本文基于RNN模型、长短期记忆网络(Long Short-Term Memory,LSTM)模型、概率上下文无关文法(Probabilistic Context-Free Grammar,PCFG)与LSTM的混合模型(简称PL模型),提出采用RNN来代替PL模型中的LSTM的思想,将PCFG与RNN在模型层面进行融合,设计了PR模型.为降低猜测模型对大训练样本的依赖,进一步提出了PR+模型,即采用RNN网络来生成字母序列,实现对口令字母段的填充.基于4个大规模真实口令数据集的实验结果显示,PR模型的攻破率略高于PL模型,且始终显著高于传统的PCFG(107量级猜测数下)和Markov模型(106量级猜测数下),并且PR模型的训练效率远优于PL模型.鉴于不同口令模型生成口令猜测的特性不同,将不同模型生成的猜测集组合来生成新的口令猜测集,并基于4个大规模真实口令数据集对不同组合方法进行了对比.尽作者所知,我们首次证实了在相同猜测数下(107~108量级猜测数),组合不同类型模型所生成口令猜测集的破解率通常高于单一猜测集.本文研究显示,GAN模型在猜测数为3.6×108时,破解率仅为31.41%,这表明GAN模型的口令破解效率劣于传统基于概论统计的模型(如PCFG模型和Markov模型)和基于RNN的口令猜测模型,并进一步指出了GAN模型表现不佳的原因.The progress of deep learning technology provides a potential way to improve the efficiency of password cracking.At present,there have been researches on applying deep learning models such as Recursive Neural Networks(RNN)and Generative Adversarial Networks(GAN)to password guessing.Based on the implementation of password guessing algorithms such as RNN,Long Short-Term Memory(LSTM),PL(the combination of Probabilistic Context Free Grammar(PCFG)and LSTM at the model level,where PCFG is Probabilistic Context-Free Grammars)and GAN,this paper uses RNN instead of LSTM in the PL model and proposes the PR model(the combination of PCFG and RNN).To reduce the dependence of the guessing model on large training samples,we use the RNN network to generate the filling set of the letter segment of password,and propose the PR+model.In the experiments,we use 4 different data sets to test the cracking ability of different models.The results show that PR model is slightly higher than PL model and significantly higher than traditional main-stream models,i.e.,PCFG(107 guesses)and Markov(106 guesses),in most data sets.At the same time,the training efficiency of PR model is far better than that of PL model.Due to the different characteristics of password samples generated by different models,we further adopt combinations of different guess sets to perform the same test process based on 4 real large-scale password datasets.To the best of our knowledge,we have confirmed for the first time that the combined guess set of different models is higher than that of the single guess set under the same guess number(107-108 guesses).While for the GAN model,when the guess number is 3.6×108,the cracking rate is only 31.41%.This indicates that the cracking rates of GAN is inferior to traditional statistics-based methods(such as PCFG and Markov)and RNN-based models,and we further explain the reason.

关 键 词:口令 猜测攻击 深度学习 循环神经网络 生成式对抗网络 

分 类 号:TP309[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象