一种基于内容和ERNIE3.0-CapsNet的中文垃圾邮件识别方法  被引量:1

A Chinese Spam Detection Method Based on Content and ERNIE3.0-CapsNet

在线阅读下载全文

作  者:单晨棱 张新有 邢焕来[1,2] 冯力 Shan Chenling;Zhang Xinyou;Xing Huanlai;and Feng Li(Tangshan Graduate School,Southwest Jiaotong University,Tangshan,Hebei 063000;School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu 611756)

机构地区:[1]西南交通大学唐山研究院,河北唐山063000 [2]西南交通大学计算机与人工智能学院,成都611756

出  处:《信息安全研究》2024年第3期233-240,共8页Journal of Information Security Research

基  金:国家自然科学基金项目(62172342)。

摘  要:针对目前中文垃圾邮件识别方法中的深度学习检测方法词向量表示不足和特征提取丰富度欠缺的问题,提出融合ERNIE3.0预训练模型的胶囊神经网络改进识别模型——ERNIE3.0-CapsNet.对于中文垃圾邮件内容文本,利用ERNIE3.0生成对于知识具备优异记忆和推理能力且语义丰富的词向量矩阵,再使用胶囊神经网络进行特征提取及分类,对于胶囊神经网络,改进了结构并使用GELU作为其动态路由的激活函数,设计了5组同类模型和4组激活函数的对比实验.在开源的TREC06C中文邮件数据集上,提出的ERNIE3.0-CapsNet模型效果在总体上表现突出,其准确率达到99.45%.实验结果表明,ERNIE3.0-CapsNet优于ERNIE3.0-TextCNN,ERNIE3.0-RNN等方法,证明了该模型在中文垃圾邮件识别效果的有效性和优异性.In order to solve the problems of inadequate word vector representation and limited feature extraction richness in the current Chinese spam recognition methods based on deep learning,this paper proposes an improved recognition model by integrating the ERNIE3.0 pre-training model with the capsule neural network,referred to as ERNIE3.0-CapsNet.For the Chinese spam content text,we leverage ERNIE3.0 to generate a word vector matrix with outstanding memory and reasoning capabilities,along with rich semantics.Subsequently,we employ the capsule neural network for feature extraction and classification.For the capsule neural network,we enhance its structure,adopting GELU as the activation function of its dynamic routing,and conduct a comparative experiment between five groups of similar models and four groups of activation functions.On the open source TREC06C Chinese email dataset,the proposed ERNIE3.0-CapsNet model exhibits remarkable overall performance,achieving an accuracy rate of 99.45%.The experimental results demonstrate the superiority of ERNIE3.0-CapsNet over methods such as ERNIE3.0-TextCNN,ERNIE3.0-RNN confirming the model’s effectiveness and superiority in Chinese spam recognition.

关 键 词:中文垃圾邮件 ERNIE3.0 胶囊神经网络 激活函数 文本分类 

分 类 号:T309[一般工业技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象