基于双向LSTM的误植域名滥用检测方法  被引量:5

Towards Typosquatting Abuse Detection using Bi-directional LSTM

在线阅读下载全文

作  者:吕品[1,2] 李全刚 柳厅文[1] 宁振虎[3] 王玉斌 时金桥[1] 方滨兴 LV Pin;LI Quan-gang;LIU Ting-wen;NING Zhen-hu;WANG Yu-bin;SHI Jin-qiao;FANG Bin-xing(School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China;Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China;University of Electronic Science and Technology Guangdong Institute of Electronic Information Engineering,Dongguan,Guangdong 523808,China)

机构地区:[1]中国科学院信息工程研究所,北京100093 [2]中国科学院大学网络空间安全学院,北京100049 [3]北京工业大学信息学部,北京100124 [4]电子科技大学广东电子信息工程研究院,广东东莞523808

出  处:《电子学报》2018年第9期2081-2086,共6页Acta Electronica Sinica

基  金:国家重点研发计划(2016YFB0801003);东莞市引进创新科研团队计划资助(No.201636000100038)

摘  要:当前,误植域名检测主要以计算域名对之间的编辑距离为基础,未能充分挖掘域名的上下文信息,且对短域名的检测易产生大量的假阳性结果。采集域名相关信息进行判定虽然有助于提高检测效果,却会引入较大的额外开销.本文采用了基于域名字符串的轻量级检测策略,并引入双向长短时记忆模型(LSTM,Long Short-Term Memory)来充分利用域名上下文,提升检测效果.本文还设计了面向域名的局部敏感哈希函数,以提高在大规模域名集合上进行误植域名检测的速度.在大量真实数据集上的实验结果表明,本文的工作改进了基于编辑距离检测方法的不足,能够有效地进行误植域名滥用检测.Prior works on detection of typosquatting abuse are based on the calculation of edit distance between domains.They do not fully utilize the context information of domains,and usually give many false positive results for short domains.Actively crawling much related information of the given domains can help improving the results,but introduce a heavy overhead.Therefore,we design a lightweight detecting strategy based on domain names,and introduce the bi-directional long short-term memory(LSTM)model to make full use of the domain context information.Furthermore,we give a locality sensitive hashing function for domain names,in order to increase the speed of typosquatting abuse detection over large-scale domain sets.Experimental results on a real data set show that the proposed method can overcome the shortcomings of edit distance based methods,and can detect typosquatting abuse efficiently.

关 键 词:误植域名 编辑距离 双向LSTM 上下文信息 局部敏感哈希 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象