检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴浩 潘善亮[1] WU Hao;PAN Shanliang(School of Electrical Engineering and Computer Science,Ningbo University,Ningbo,Zhejiang 315211,China)
机构地区:[1]宁波大学信息科学与工程学院,浙江宁波315211
出 处:《中文信息学报》2022年第1期92-103,共12页Journal of Chinese Information Processing
基 金:浙江省公益性技术应用研究计划(2017C33001)。
摘 要:以网络暴力为主的恶意攻击行为已经导致多起恶性事件发生,违规评论问题引起了社会广泛关注。当前违规评论检测手段主要是依靠敏感词屏蔽的方式,这种方式无法有效识别不含低俗用语的恶意评论。该文通过爬虫及人工标注的方式建立一个中文违规评论数据集,采用BERT预训练模型进行词嵌入操作,以保留文本隐含的语义信息。在BERT基础上再利用结合注意力机制的RCNN进一步提取评论的上下文特征,并加入多任务学习联合训练提升模型分类精度及泛化能力。该模型不再完全依赖敏感词库。实验结果表明,该文提出的模型相比传统模型可以更好地理解语义信息,利于发现潜在恶意。该文模型在识别中文违规评论数据集时精确率达到了94.24%,比传统TextRNN高8.42%,比结合注意力机制的TextRNN高6.92%。The current detection method of illegal comments mainly relies on sensitive words screening, incapable of effectively identifying malicious comments without vulgar language. In this paper, a data set of Chinese illegal comments is established by crawler and manual annotation. On the basis of BERT, RCNN combined with attention mechanism is used to further extract the context features of comments, and multi-task joint training is adopted to improve the classification accuracy and generalization ability of the model. The model is independent to sensitive thesaurus. Experimental results show that the proposed model can better understand the semantic information than the traditional model, achieving a precision of 94.24%, which is 8.42% higher than traditional TextRNN and 6.92% higher than TextRNN combined with attention mechanism.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30