检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周军芽 吴进伟 吴广飞 张何为 ZHOU Junya;WU Jinwei;WU Guangfei;ZHANG Hewei(Lishui Power Supply Company,State Grid Zhejiang Electric Power Co.,Ltd,Lishui 323000,China;不详)
机构地区:[1]国网浙江省电力有限公司丽水供电公司,浙江丽水323000
出 处:《武汉理工大学学报(信息与管理工程版)》2024年第2期312-316,共5页Journal of Wuhan University of Technology:Information & Management Engineering
摘 要:为了准确识别与处理敏感词,针对分词时延较高、识别精度较低的问题,提出基于双向长短期记忆(Bi-LSTM)神经网络的短文本敏感词识别方法。分析敏感词库,将敏感词库划分为两大类、三个等级,预处理短文本干扰信息(特殊字符、繁体字与拆分汉字),引入Bi-LSTM神经网络构建短文本分词模型,二次训练确定最佳参数,反复计算词语的敏感性数值,通过敏感性对比函数,提取短文本敏感词,并匹配敏感词库,确定敏感词的类别与等级,实现短文本敏感词识别。实验结果表明:在不同实验组别下,应用本文方法获得的短文本分词时延均低于给定最大限值,短文本敏感词识别精度高于84.42%,应用性能较佳。In order to accurately identify and process sensitive words,a short text sensitive word recognition method based on bidirectional long short term memory(Bi-LSTM)neural network was proposed to address the issues of high segmentation delay and low recognition accuracy.By analyzing the sensitive lexicon,the sensitive lexicon was divided into two categories and three levels,and the short text interference information(special characters,traditional characters and split Chinese characters)was preprocessed.The Bi-LSTM neural network was introduced to construct a short text segmentation model.The optimal parameters were determined by secondary training,and the sensitivity values of words were calculated repeatedly.Through the sensitivity comparison function,the short text sensitive words were extracted,and the sensitive lexicon was matched to determine the category and level of sensitive words,so as to realize the recognition of short text sensitive words.The experimental results showed that in different experimental groups,the short text segmentation delay obtained by applying the method proposed in this paper is lower than the given maximum limit,and the recognition accuracy of sensitive words in short text is higher than 84.42%,indicating better application performance.
关 键 词:短文本 敏感词识别 文本过滤 编辑距离 双向长短期记忆神经网络
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49