基于BERT-RCNN的中文违规评论识别研究  被引量:4

Research on Recognition of Chinese Illegal Comments Based on BERT-RCNN

在线阅读下载全文

作  者:吴浩 潘善亮[1] WU Hao;PAN Shanliang(School of Electrical Engineering and Computer Science,Ningbo University,Ningbo,Zhejiang 315211,China)

机构地区:[1]宁波大学信息科学与工程学院,浙江宁波315211

出  处:《中文信息学报》2022年第1期92-103,共12页Journal of Chinese Information Processing

基  金:浙江省公益性技术应用研究计划(2017C33001)。

摘  要:以网络暴力为主的恶意攻击行为已经导致多起恶性事件发生,违规评论问题引起了社会广泛关注。当前违规评论检测手段主要是依靠敏感词屏蔽的方式,这种方式无法有效识别不含低俗用语的恶意评论。该文通过爬虫及人工标注的方式建立一个中文违规评论数据集,采用BERT预训练模型进行词嵌入操作,以保留文本隐含的语义信息。在BERT基础上再利用结合注意力机制的RCNN进一步提取评论的上下文特征,并加入多任务学习联合训练提升模型分类精度及泛化能力。该模型不再完全依赖敏感词库。实验结果表明,该文提出的模型相比传统模型可以更好地理解语义信息,利于发现潜在恶意。该文模型在识别中文违规评论数据集时精确率达到了94.24%,比传统TextRNN高8.42%,比结合注意力机制的TextRNN高6.92%。The current detection method of illegal comments mainly relies on sensitive words screening, incapable of effectively identifying malicious comments without vulgar language. In this paper, a data set of Chinese illegal comments is established by crawler and manual annotation. On the basis of BERT, RCNN combined with attention mechanism is used to further extract the context features of comments, and multi-task joint training is adopted to improve the classification accuracy and generalization ability of the model. The model is independent to sensitive thesaurus. Experimental results show that the proposed model can better understand the semantic information than the traditional model, achieving a precision of 94.24%, which is 8.42% higher than traditional TextRNN and 6.92% higher than TextRNN combined with attention mechanism.

关 键 词:违规评论识别 迁移学习 BERT预训练模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象