基于特征矩阵构造与BP神经网络的垃圾文本过滤模型被引量：6

Junk Text Filtering Model Based on Feature Matrix Construction and BP Neural Network

作　　者：方瑞于俊洋[1] 董李锋 FANG Rui;YU Junyang;DONG Lifeng(School of Software,Henan University,Kaifeng,Henan 475000,China;Henan Jiuyu Tenglong Information Engineering Co.,Ltd.,Zhengzhou 450000,China)

机构地区：[1]河南大学软件学院,河南开封475000 [2]河南九域腾龙信息工程有限公司,郑州450000

出　　处：《计算机工程》2020年第8期271-276,共6页Computer Engineering

基　　金：国家自然科学基金(61602525);河南省科技发展计划项目(182102210229)。

摘　　要：在网络社交平台海量的信息文本中含有许多垃圾文本,这些文本的广泛散布影响了人们正常社交。为此,提出一种垃圾文本过滤模型。通过BERT模型提取文本的句编码,采用B-Feature方法对句编码进行特征构造,并根据文本与所得特征之间的联系进一步将该特征构造为特征矩阵,运用BP神经网络分类器对特征矩阵进行处理,检测出垃圾文本并进行过滤。实验结果表明,该模型在长、中、短文本数据集上的准确率较TFIDF-BP模型分别提高7.8%、3.8%和11.7%,在中、短文本数据集上的准确率较朴素贝叶斯模型分别提高2.1%和13.7%,能有效对垃圾文本进行分类和过滤。There are a lot of junk texts in the massive information of online social platforms,which hinder the normal social intercourse of people when they are widely spread.To address the problem,this paper proposes a junk text filtering model.The model uses the BERT model to extract sentence coding of the text.Then the feature of sentence coding is constructed by using the B-Feature method,and the obtained feature is further constructed as a feature matrix based on the relationship between the feature and the text.The feature matrix is processed by using a BP neural network classifier,and junk texts are detected and filtered.Experimental results show that the accuracy rate of the proposed model on text datasets of long,medium,and short length is respectively 7.8%,3.8%and 11.7%higher than that of the TFIDF-BP model,and the accuracy of the proposed model on text datasets of medium and short length is respectively 2.1%and 13.7%higher than that of the naive Bayes model,which can effectively classify and filter junk texts.

关键词：BERT模型特征构造 BP神经网络垃圾文本过滤文本分类句编码

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于特征矩阵构造与BP神经网络的垃圾文本过滤模型被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于特征矩阵构造与BP神经网络的垃圾文本过滤模型 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于特征矩阵构造与BP神经网络的垃圾文本过滤模型被引量：6