基于多语BERT的无监督攻击性言论检测被引量：5

Detection of unsupervised offensive speech based on multilingual BERT

作　　者：师夏阳张风远袁嘉琪黄敏 SHI Xiayang;ZHANG Fengyuan;YUAN Jiaqi;HUANG Min(College of Software Engineering,Zhengzhou University of Light Industry,Zhengzhou Henan 450001,China;College of Mathematics and Information Science,Zhengzhou University of light industry,Zhengzhou Henan 450001,China)

机构地区：[1]郑州轻工业大学软件学院,郑州450001 [2]郑州轻工业大学数学与信息科学学院,郑州450001

出　　处：《计算机应用》2022年第11期3379-3385,共7页journal of Computer Applications

基　　金：河南省重点研发与推广专项(212102210547)。

摘　　要：攻击性言论会对社会安定造成严重不良影响,但目前攻击性言论自动检测主要集中在少数几种高资源语言,对低资源语言缺少足够的攻击性言论标注语料导致检测困难,为此,提出一种跨语言无监督攻击性迁移检测方法。首先,使用多语BERT(mBERT)模型在高资源英语数据集上进行对攻击性特征的学习,得到一个原模型;然后,通过分析英语与丹麦语、阿拉伯语、土耳其语、希腊语的语言相似程度,将原模型迁移到这四种低资源语言上,实现对低资源语言的攻击性言论自动检测。实验结果显示,与BERT、线性回归(LR)、支持向量机(SVM)、多层感知机(MLP)这四种方法相比,所提方法在丹麦语、阿拉伯语、土耳其语、希腊语这四种语言上的攻击性言论检测的准确率和F1值均提高了近2个百分点,接近目前的有监督检测,可见采用跨语言模型迁移学习和迁移检测相结合的方法能够实现对低资源语言的无监督攻击性检测。Offensive speech has a serious negative impact on social stability.Currently,automatic detection of offensive speech focuses on a few high⁃resource languages,and the lack of sufficient offensive speech tagged corpus for low⁃resource languages makes it difficult to detect offensive speech in low⁃resource languages.In order to solve the above problem,a cross⁃language unsupervised offensiveness transfer detection method was proposed.Firstly,an original model was obtained by using the multilingual BERT(multilingual Bidirectional Encoder Representation from Transformers,mBERT)model to learn the offensive features on the high⁃resource English dataset.Then,by analyzing the language similarity between English and Danish,Arabic,Turkish,Greek,the obtained original model was transferred to the above four low⁃resource languages to achieve automatic detection of offensive speech on low⁃resource languages.Experimental results show that compared with the four methods of BERT,Linear Regression(LR),Support Vector Machine(SVM)and Multi⁃Layer Perceptron(MLP),the proposed method increases both the accuracy and F1 score of detecting offensive speech of languages such as Danish,Arabic,Turkish,and Greek by nearly 2 percentage points,which are close to those of the current supervised detection,showing that the combination of cross⁃language model transfer learning and transfer detection can achieve unsupervised offensiveness detection of low⁃resource languages.

关键词：跨语言模型攻击性言论检测 BERT 无监督方法迁移学习

分类号：TP391.[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多语BERT的无监督攻击性言论检测被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多语BERT的无监督攻击性言论检测 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于多语BERT的无监督攻击性言论检测被引量：5