检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:魏欢[1] WEI Huan(College of Computer and Arts,Anhui Technical College of Industry and Economy,Hefei 230051,China)
出 处:《兰州工业学院学报》2019年第4期76-80,共5页Journal of Lanzhou Institute of Technology
基 金:安徽省质量工程项目(2015M00C144)
摘 要:为了提高伪装型垃圾网页检测能力,提出一种基于二元分类的伪装型垃圾网页检测算法.对采集的各类网站网页样本进行暗链域名特征分析和网页爬虫分析,构建伪装型垃圾网页分布的相关文本和图片信息特征,对伪装型垃圾网页样本集采用垂直爬虫和异常特征挖掘方法进行垃圾信息过滤;以网页赋权垃圾信息为测试集,采用二元分类方法对伪装型垃圾网页进行路径模板分析,对全部的异常样本进行垂直爬虫检索;提取伪装型垃圾网页的相关文本的字体颜色与网页背景色,将伪装型垃圾网页的特征提取结果输入到二元语义分类器中进行数据分类,结合大数据融合聚类方法实现伪装型垃圾网页检测.仿真结果表明:采用该方法进行伪装型垃圾网页检测的准确性较高,抗垃圾网页和暗链接干扰能力较好,提高了网页安全监控能力.In order to improve the detection ability of camouflaged garbage pages,an algorithm based on binary classification is proposed.Based on the analysis of the dark chain domain name and the crawler,the text and picture information features of the distribution of camouflaged garbage pages are constructed.In this paper,vertical crawler and abnormal feature mining methods are used to filter the garbage information in the sample set of camouflaged garbage pages,and the weighted spam information is used as the test set.The path template analysis is carried out by using the binary classification method.All abnormal samples are retrieved by vertical crawler,the font color and background color of the text are extracted,and the feature extraction results of the camouflaged garbage page are input into the binary semantic classifier for data classification.The big data fusion clustering method is combined with to realize camouflage garbage page detection.The simulation results show that the proposed method is more accurate and can resist the interference of spam pages and dark links and improve the security monitoring ability of web pages.
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229