检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:史磊 邓桂英 张恒[1] 刘宇 肖建芳[1] SHI Lei;DENG Guiying;ZHANG Heng;LIU Yu;XIAO Jianfang(China Internet Network Information Center,Beijing 100190,China)
出 处:《微型电脑应用》2024年第6期242-246,共5页Microcomputer Applications
摘 要:自二十一世纪以来,大量网页在互联网中被构建,为人们提供了各种信息,不仅加快了信息交换的速度,而且使信息流通的成本大大降低。与此同时大量不良网站不断涌现,然而对于不良网页的认定多基于人工识别,无法应对不良网站的大规模出现,因此提出基于HDBSCAN的多模态高效不良网页聚类算法。利用HDBSCAN对不良网页图片进行初步聚类,对初步聚类的结果叠加使用不良网页文本信息、不良网页结构信息等多个信息要素进一步归类合并,将相似网页合并为一个大而全的图片集合。实验结果表明,相比于HDBSCAN,改进后的聚类算法提高了聚类质量,具有更好的聚类效果,不良网站的处理效率得到明显提升。Since the 21st century,a large number of Web pages are constructed on Internet,and provide people with various types of information,not only accelerating the speed of information exchange,but also greatly reducing the cost of information circulation.At the same time,a large number of bad Web pages are constantly emerging.However,the identification of bad Web pages is mostly based on manual recognition,which can not cope with the large-scale emergence of bad Web pages.This paper proposes a multi-modal efficient bad Web page clustering algorithm based on HDBSCAN.The HDBSCAN is used to preliminarily cluster bad Web page images.The preliminary clustering results are overlaid with multiple information elements such as bad Web page text information and bad Web page structure information to further classify and merge.Similar Web pages are merged into a large and complete set of images.The experimental results show that compared to HDBSCAN,the inproved clustering algorithm improves the clustering quality,has better clustering effects,and significantly improves the processing efficiency of bad websites.
分 类 号:TN91[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.170