基于扩展Trie树的中文敏感词变体检测被引量：1

Chinese sensitive word variant detection based on extended Trie tree

作　　者：赵天舒沈颖李柏岩[1] 刘晓强[1] 朱旻 ZHAO Tianshu;SHEN Ying;LI Baiyan;LIU Xiaoqiang;ZHU Min(School of Computer Science and Technology,Donghua University,Shanghai 201620,China;Shanghai Key Laboratory of Computer Software Testing and Evaluating,Shanghai 201112,China)

机构地区：[1]东华大学计算机科学与技术学院,上海201620 [2]上海市计算机软件评测重点实验室,上海201112

出　　处：《智能计算机与应用》2024年第4期215-221,共7页Intelligent Computer and Applications

摘　　要：网络语言表达方式的随意性和自由性使词语变体在网页上经常出现,给网页信息安全带来了挑战。本文针对中文敏感词变体检测问题,提出一种基于扩展Trie树的敏感词变体快速检测方法。首先,对中文敏感词变体类型进行归类,结合中文敏感词特点,通过增强节点内信息和节点间联系构建扩展Trie树;再依据中文变体的生成规则检索Trie树;最后,使用基于BERT的二分类算法对结果进行二次判别,降低误检率。实验表明:该算法精准度达到98.69%,召回率达到94.25%,能够识别常见的中文敏感词变体并在时间效率上满足应用需求。The arbitrariness and freedom of expression in internet language often lead to various word variants appearing on web pages,posing a challenge to web information security.In this paper,a fast detection method of sensitive word variants based on extended Trie tree is presented,which can be used to detect Chinese sensitive word variants.This paper first classifies the types of Chinese sensitive word variants,then builds an extended Trie tree by enhancing the information within the nodes and the connections between the nodes,then retrieves the Trie tree according to the generation rules of Chinese variants,and finally uses the BERT-based binary classification algorithm to discriminate the retrieval results twice to reduce the false detection rate.Experiments show that the accuracy of the algorithm is 98.69%and the recall rate is 94.25%.The algorithm can recognize common Chinese sensitive word variants and meet the application requirements in time efficiency.

关键词：敏感词词语变体 TRIE树 BERT

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于扩展Trie树的中文敏感词变体检测被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于扩展Trie树的中文敏感词变体检测 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于扩展Trie树的中文敏感词变体检测被引量：1