基于特征矩阵构造与BP神经网络的垃圾文本过滤模型  被引量:6

Junk Text Filtering Model Based on Feature Matrix Construction and BP Neural Network

在线阅读下载全文

作  者:方瑞 于俊洋[1] 董李锋 FANG Rui;YU Junyang;DONG Lifeng(School of Software,Henan University,Kaifeng,Henan 475000,China;Henan Jiuyu Tenglong Information Engineering Co.,Ltd.,Zhengzhou 450000,China)

机构地区:[1]河南大学软件学院,河南开封475000 [2]河南九域腾龙信息工程有限公司,郑州450000

出  处:《计算机工程》2020年第8期271-276,共6页Computer Engineering

基  金:国家自然科学基金(61602525);河南省科技发展计划项目(182102210229)。

摘  要:在网络社交平台海量的信息文本中含有许多垃圾文本,这些文本的广泛散布影响了人们正常社交。为此,提出一种垃圾文本过滤模型。通过BERT模型提取文本的句编码,采用B-Feature方法对句编码进行特征构造,并根据文本与所得特征之间的联系进一步将该特征构造为特征矩阵,运用BP神经网络分类器对特征矩阵进行处理,检测出垃圾文本并进行过滤。实验结果表明,该模型在长、中、短文本数据集上的准确率较TFIDF-BP模型分别提高7.8%、3.8%和11.7%,在中、短文本数据集上的准确率较朴素贝叶斯模型分别提高2.1%和13.7%,能有效对垃圾文本进行分类和过滤。There are a lot of junk texts in the massive information of online social platforms,which hinder the normal social intercourse of people when they are widely spread.To address the problem,this paper proposes a junk text filtering model.The model uses the BERT model to extract sentence coding of the text.Then the feature of sentence coding is constructed by using the B-Feature method,and the obtained feature is further constructed as a feature matrix based on the relationship between the feature and the text.The feature matrix is processed by using a BP neural network classifier,and junk texts are detected and filtered.Experimental results show that the accuracy rate of the proposed model on text datasets of long,medium,and short length is respectively 7.8%,3.8%and 11.7%higher than that of the TFIDF-BP model,and the accuracy of the proposed model on text datasets of medium and short length is respectively 2.1%and 13.7%higher than that of the naive Bayes model,which can effectively classify and filter junk texts.

关 键 词:BERT模型 特征构造 BP神经网络 垃圾文本过滤 文本分类 句编码 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象