检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:顾孟钧 冯文舟 陈中兵 Gu Mengjun;Feng Wenzhou;Chen Zhongbing(China Telecom Zhejiang Brach,Hangzhou Zhejiang,310000;Public Security Bureau of Linhai City,Taizhou Zhejiang,318000;Zhejiang Public Information Industry Co.,Ltd,Hangzhou Zhejiang,310000)
机构地区:[1]中国电信股份有限公司浙江分公司,浙江杭州310000 [2]浙江省台州临海市公安局,浙江台州318000 [3]浙江省公众信息产业有限公司,浙江杭州310000
出 处:《工业信息安全》2022年第7期28-35,共8页Industry Information Security
摘 要:针对日益泛滥的垃圾邮件问题,本文使用多种算法对不同长度下中文垃圾邮件分类模型进行比较研究。首先,使用朴素贝叶斯算法对邮件数据集进行训练和测试;然后,从邮件数据集中筛选出三种不同文本长度的数据集和两种不同大小样本量的数据集,组成五个实验样本集;最后分别使用多种传统机器学习模型、神经网络模型和预训练模型在五个实验样本集上进行建模比较。实验结果表明,预训练模型ALBERT最适合分类句子长度的中文垃圾邮件,传统机器学习模型SVM最适合分类段落长度的中文垃圾邮件,神经网络模型TextRCNN最适合分类篇章长度的中文垃圾邮件。实验结果还显示,神经网络模型TextRNN和预训练模型RoBERTa不适用于小样本数据。In response to the increasingly widespread spam problem,this paper uses a variety of algorithms to compare Chinese spam classification models with different lengths.Firstly,use the naive Bayes algorithm to train and test the mail dataset.Then,three datasets with different text lengths and two datasets with different sample sizes were screened out from the email dataset to form five experimental sample sets.Finally,a variety of traditional machine learning models,neural network models and pre-trained models are used to model and compare on five experimental sample sets.The experimental results show that the pre-trained model ALBERT is best for classifying Chinese spam with sentence length,the traditional machine learning model SVM is best for classifying Chinese spam with paragraph length,and the neural network model TextRCNN is best for classifying Chinese spam with text length.The experimental results also show that the neural network model TextRNN and the pre-trained model RoBERTa are not suitable for small sample data.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7