基于文本超图构建的中文仇恨言论检测模型  

Chinese Hate Speech Detection Model Based on Text Hypergraphs Construction

在线阅读下载全文

作  者:张顺香[1,2] 王琰慧 李冠憬 周渝皓 李嘉伟 ZHANG Shunxiang;WANG Yanhui;LI KuanChing;ZHOU Yuhao;LI Jiawei(School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan Anhui 232001,China;Artificial Intelligence Research Institute of Hefei Comprehensive National Science Center,Hefei Anhui 230026,China;School of Mathematics and Big Data,Anhui University of Science and Technology,Huainan Anhui 232001,China)

机构地区:[1]安徽理工大学计算机科学与工程学院,安徽淮南232001 [2]合肥综合性国家科学中心人工智能研究院,安徽合肥230026 [3]安徽理工大学数学与大数据学院,安徽淮南232001

出  处:《安徽理工大学学报(自然科学版)》2024年第4期77-88,共12页Journal of Anhui University of Science and Technology:Natural Science

基  金:国家自然科学基金面上项目(62076006);安徽高校协同创新基金资助项目(GXXT-2021-008)。

摘  要:目的仇恨言论检测可以判定文本是否具有仇恨倾向,有助于筛除网络上的不当言论,维护网络环境的安全与秩序。为有效解决现有的仇恨言论检测方法依赖单一特征的图结构,难以捕捉文中由于对目标对象的隐性提及以及修辞手法的使用所带来的复杂语义,从而导致仇恨言论检测准确率不高的问题。方法提出一种基于文本超图构建的中文仇恨言论检测模型,通过分析文本中的语序和语法信息,及利用大语言模型针对目标对象所获取的语义扩展信息来构建文本超图,从而提升仇恨言论检测的效果。首先,构建提示模板引导大语言模型识别文本中的目标对象,并对其进行知识补充作为文本的语义扩展信息;然后,构建文本超图,以挖掘文本中隐含的语义结构和关联关系,并通过超图注意力机制聚合超图信息得到全局特征;同时,利用roberta-wwm-ext对原始文本进行动态特征提取,得到文本特征;最后利用交叉注意力机制实现文本特征与全局特征的融合,并通过sigmoid计算仇恨倾向检测仇恨言论。结果在COLDataset数据集上进行实验,该方法在实验中取得了较好的效果,可以提高检测的精确率和F1值。结论实验结果表明,该模型能够有效地提升中文仇恨言论的检测效果。Objective Hate speech detection can determine whether a text contains hateful tendency,which helps filter inappropriate comments on social networks and maintain public order on the Web.To effectively address the issue existing hate speech detection methods rely on single-feature graph structures,which to capture the complex semantics arising from implicit references to target subjects and the use of rhetorical devices,leading to low detection accuracy.Methods A Chinese hate speech detection model based on text hypergraphs construction s proposed.By analyzing the syntactic order and grammar information in the text,as well as utilizing semantic extension information obtained by large language models for target objects to construct text hypergraphs,the effectiveness of hate speech detection s improved.Firstly,the prompt template s designed to guide the large language model in identifying the target objects within the text and supplement them with knowledge as semantic extension information Subsequently,the text hypergraphs constructed to explore the implicit semantic structures and relationships in texts,and the hypergraph information s aggregated through hypergraph attention mechanisms to obtain global features.Simultaneously,text features extracted by utilizing roberta-wwm-ext from original text.Finally,text features integrated with global features using a cross-attention mechanism,and the propensity for hate s calculated by sigmoid function to detect hate speech.Results Experiments on the COLDataset show that this method achieved promising results,improving detection precision and F1 score.Conclusion Experiments conducted on the COLDataset dataset demonstrate that the model effectively improves the detection of Chinese hate speech.

关 键 词:仇恨言论检测 文本超图 大语言模型 roberta-wwm-ext 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象