检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:严瑾 董科军 李洪涛[1] YAN Jin;DONG Kejun;LI Hongtao(China Internet Network Information Center,Beijing 100190,China)
出 处:《数据与计算发展前沿》2024年第3期127-138,共12页Frontiers of Data & Computing
基 金:国家重点研发计划课题“互联网基础设施关键信息分析技术”(2022YFB3105003)。
摘 要:【目的】Web跟踪器通过嵌入用户访问的网站,收集用户的标识与访问信息,用于个性化推荐服务和网站性能分析等。然而,Web跟踪器对互联网用户来说可能会造成隐私泄漏,让用户有选择的关闭/打开Web跟踪对互联网健康发展至关重要,而Web跟踪器的自动识别是前提与基础。【方法】通过对实际数据的分析,发现Web跟踪器在URL的文本语义和嵌入关联(即共现)两个维度的重要特征,并据此设计了融合关联特征与语义特征的Web跟踪器深度识别方法。该方法首先建立用户直接访问网站和其嵌入URL的嵌入关系二部图,并基于DeepWalk算法提取URL的嵌入特征向量;其次,基于自然语言处理领域的预训练BERT模型提取URL字符串的文本语义特征;最后,使用注意力机制聚合两类特征,并使用多层感知机模型实现URL的分类,识别Web跟踪器。【结果】基于真实数据的实验结果表明,与已有方法相比,本文所提方法提高了识别的准确度,其F1分数可达到0.91。【结论】基于深度学习的Web跟踪器识别方法仅依赖跟踪器URL及其在网站的嵌入关系信息,取得了较高的识别准确度,易于部署。[Objective]Web trackers embedded in the website can collect the user identification and access information from user’s visit.The collected information may be used for personalized recommendation services and website performance analysis.However,web trackers may also cause Internet users privacy leakages.It is very important to allow users to selectively turn off/on web tracking,where the automatic detection of web trackers is the premise and foundation.[Methods]By analyzing real-life data sets,this paper reveals two important characteristics of web trackers from the perspectives of URL text semantics and embedded association(i.e.,cooccurrence).With this basis,this paper designs a web tracker detection method based on deep learning that consolidates the semantic features and association features of URLs.Specifically,the method first constructs the bipartite graph of the embedding relationship between the web-sites that users visit directly and the embedded URLs of the websites,and then extracts the embedded feature vector of the URL by applying the DeepWalk algorithm.Secondly,the method extracts the text semantic features of the URL strings using the pre-trained BERT model in the field of natural language processing.Finally,the method uses the attention mechanism to consolidate the two types of features and uses the multi-layer perceptron model to implement URL classification and identify Web trackers.[Results]Experimental results based on real-life data sets show that compared with existing methods,the proposed method improves the recognition accuracy,and its F1 score can reach 0.91.[Conclusions]The proposed method achieves relatively high accuracy in detecting trackers by using only the URLs of trackers and their embedding information in websites.As such,it is easy to be deployed in practice.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7