检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵天舒 沈颖 李柏岩[1] 刘晓强[1] 朱旻 ZHAO Tianshu;SHEN Ying;LI Baiyan;LIU Xiaoqiang;ZHU Min(School of Computer Science and Technology,Donghua University,Shanghai 201620,China;Shanghai Key Laboratory of Computer Software Testing and Evaluating,Shanghai 201112,China)
机构地区:[1]东华大学计算机科学与技术学院,上海201620 [2]上海市计算机软件评测重点实验室,上海201112
出 处:《智能计算机与应用》2024年第4期215-221,共7页Intelligent Computer and Applications
摘 要:网络语言表达方式的随意性和自由性使词语变体在网页上经常出现,给网页信息安全带来了挑战。本文针对中文敏感词变体检测问题,提出一种基于扩展Trie树的敏感词变体快速检测方法。首先,对中文敏感词变体类型进行归类,结合中文敏感词特点,通过增强节点内信息和节点间联系构建扩展Trie树;再依据中文变体的生成规则检索Trie树;最后,使用基于BERT的二分类算法对结果进行二次判别,降低误检率。实验表明:该算法精准度达到98.69%,召回率达到94.25%,能够识别常见的中文敏感词变体并在时间效率上满足应用需求。The arbitrariness and freedom of expression in internet language often lead to various word variants appearing on web pages,posing a challenge to web information security.In this paper,a fast detection method of sensitive word variants based on extended Trie tree is presented,which can be used to detect Chinese sensitive word variants.This paper first classifies the types of Chinese sensitive word variants,then builds an extended Trie tree by enhancing the information within the nodes and the connections between the nodes,then retrieves the Trie tree according to the generation rules of Chinese variants,and finally uses the BERT-based binary classification algorithm to discriminate the retrieval results twice to reduce the false detection rate.Experiments show that the accuracy of the algorithm is 98.69%and the recall rate is 94.25%.The algorithm can recognize common Chinese sensitive word variants and meet the application requirements in time efficiency.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.216.147.211