检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:师夏阳 张风远 袁嘉琪 黄敏 SHI Xiayang;ZHANG Fengyuan;YUAN Jiaqi;HUANG Min(College of Software Engineering,Zhengzhou University of Light Industry,Zhengzhou Henan 450001,China;College of Mathematics and Information Science,Zhengzhou University of light industry,Zhengzhou Henan 450001,China)
机构地区:[1]郑州轻工业大学软件学院,郑州450001 [2]郑州轻工业大学数学与信息科学学院,郑州450001
出 处:《计算机应用》2022年第11期3379-3385,共7页journal of Computer Applications
基 金:河南省重点研发与推广专项(212102210547)。
摘 要:攻击性言论会对社会安定造成严重不良影响,但目前攻击性言论自动检测主要集中在少数几种高资源语言,对低资源语言缺少足够的攻击性言论标注语料导致检测困难,为此,提出一种跨语言无监督攻击性迁移检测方法。首先,使用多语BERT(mBERT)模型在高资源英语数据集上进行对攻击性特征的学习,得到一个原模型;然后,通过分析英语与丹麦语、阿拉伯语、土耳其语、希腊语的语言相似程度,将原模型迁移到这四种低资源语言上,实现对低资源语言的攻击性言论自动检测。实验结果显示,与BERT、线性回归(LR)、支持向量机(SVM)、多层感知机(MLP)这四种方法相比,所提方法在丹麦语、阿拉伯语、土耳其语、希腊语这四种语言上的攻击性言论检测的准确率和F1值均提高了近2个百分点,接近目前的有监督检测,可见采用跨语言模型迁移学习和迁移检测相结合的方法能够实现对低资源语言的无监督攻击性检测。Offensive speech has a serious negative impact on social stability.Currently,automatic detection of offensive speech focuses on a few high⁃resource languages,and the lack of sufficient offensive speech tagged corpus for low⁃resource languages makes it difficult to detect offensive speech in low⁃resource languages.In order to solve the above problem,a cross⁃language unsupervised offensiveness transfer detection method was proposed.Firstly,an original model was obtained by using the multilingual BERT(multilingual Bidirectional Encoder Representation from Transformers,mBERT)model to learn the offensive features on the high⁃resource English dataset.Then,by analyzing the language similarity between English and Danish,Arabic,Turkish,Greek,the obtained original model was transferred to the above four low⁃resource languages to achieve automatic detection of offensive speech on low⁃resource languages.Experimental results show that compared with the four methods of BERT,Linear Regression(LR),Support Vector Machine(SVM)and Multi⁃Layer Perceptron(MLP),the proposed method increases both the accuracy and F1 score of detecting offensive speech of languages such as Danish,Arabic,Turkish,and Greek by nearly 2 percentage points,which are close to those of the current supervised detection,showing that the combination of cross⁃language model transfer learning and transfer detection can achieve unsupervised offensiveness detection of low⁃resource languages.
关 键 词:跨语言模型 攻击性言论检测 BERT 无监督方法 迁移学习
分 类 号:TP391.[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.175