检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:方瑞 于俊洋[1] 董李锋 FANG Rui;YU Junyang;DONG Lifeng(School of Software,Henan University,Kaifeng,Henan 475000,China;Henan Jiuyu Tenglong Information Engineering Co.,Ltd.,Zhengzhou 450000,China)
机构地区:[1]河南大学软件学院,河南开封475000 [2]河南九域腾龙信息工程有限公司,郑州450000
出 处:《计算机工程》2020年第8期271-276,共6页Computer Engineering
基 金:国家自然科学基金(61602525);河南省科技发展计划项目(182102210229)。
摘 要:在网络社交平台海量的信息文本中含有许多垃圾文本,这些文本的广泛散布影响了人们正常社交。为此,提出一种垃圾文本过滤模型。通过BERT模型提取文本的句编码,采用B-Feature方法对句编码进行特征构造,并根据文本与所得特征之间的联系进一步将该特征构造为特征矩阵,运用BP神经网络分类器对特征矩阵进行处理,检测出垃圾文本并进行过滤。实验结果表明,该模型在长、中、短文本数据集上的准确率较TFIDF-BP模型分别提高7.8%、3.8%和11.7%,在中、短文本数据集上的准确率较朴素贝叶斯模型分别提高2.1%和13.7%,能有效对垃圾文本进行分类和过滤。There are a lot of junk texts in the massive information of online social platforms,which hinder the normal social intercourse of people when they are widely spread.To address the problem,this paper proposes a junk text filtering model.The model uses the BERT model to extract sentence coding of the text.Then the feature of sentence coding is constructed by using the B-Feature method,and the obtained feature is further constructed as a feature matrix based on the relationship between the feature and the text.The feature matrix is processed by using a BP neural network classifier,and junk texts are detected and filtered.Experimental results show that the accuracy rate of the proposed model on text datasets of long,medium,and short length is respectively 7.8%,3.8%and 11.7%higher than that of the TFIDF-BP model,and the accuracy of the proposed model on text datasets of medium and short length is respectively 2.1%and 13.7%higher than that of the naive Bayes model,which can effectively classify and filter junk texts.
关 键 词:BERT模型 特征构造 BP神经网络 垃圾文本过滤 文本分类 句编码
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.185