基于URL语言特征的钓鱼网站检测算法  被引量:8

Phishing Detection Algorithm Based on Language Features of URL

在线阅读下载全文

作  者:王雨琪 刘博文 林果园[1,2,3] WANG Yuqi;LIU Bowen;LIN Guoyuan(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China;Mine Digitization Engineering Research Center of the Ministry of Education,Xuzhou,Jiangsu 221116,China;State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China)

机构地区:[1]中国矿业大学计算机科学与技术学院,江苏徐州221116 [2]矿山数字化教育部工程研究中心,江苏徐州221116 [3]南京大学计算机软件新技术国家重点实验室,南京210023

出  处:《计算机工程与应用》2019年第24期84-90,共7页Computer Engineering and Applications

基  金:江苏省产学研前瞻性联合研究项目(No.BY2016026-04);软件新技术国家重点实验室开放基金(No.KFKT2018B27)

摘  要:为了应对钓鱼网站的检测逃避策略,提出一种基于URL语言特征的钓鱼网站检测算法。通过分析钓鱼网站和合法网站的URL在不同检测域上的差异,定义基元和敏感度来描述其语言特征。先根据基元对主级域名进行相似性检测,当相似性低于预先设定的阈值时,选取有效的子域名特征,利用随机森林算法对子域名的语言特征进行学习和检测。实验结果表明,该算法的准确率达95.6%,系统运行时间相对较小,平均识别时间小于1 s。In order to deal with detection avoidance strategies of phishing sites,a phishing detection algorithm based on language features of URL is proposed.Through analyzing the differences in different detection domains of phishing sites and legal sites,the concept of motif and sensitivity is defined to describe language features.First of all,the similarity of main level domain is detected based on motif.When the similarity is lower than the pre-set threshold,valid subdomain features are selected.Then language features of subdomains are studied and detected using random forests.The results show that the accuracy rate of the proposed algorithm is 95.6%.The system running time is relatively less,and the average recognition time is less than 1 s.

关 键 词:钓鱼网站 统一资源定位符(URL) 语言特征 基元 敏感度 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象