中文文本去毒任务的研究

Research on Detoxification Task of Chinese Texts

作　　者：刘江盛左家莉[1] 胡玉婷万剑怡[1] 王明文[1] LIU Jiangsheng;ZUO Jiali;HU Yuting;WAN Jianyi;WANG Mingwen(School of Computer Information Engineering,Jiangxi Normal University,Nanchang 330022,China)

机构地区：[1]江西师范大学计算机信息工程学院,江西南昌330022

出　　处：《山西大学学报（自然科学版）》2024年第3期528-538,共11页Journal of Shanxi University(Natural Science Edition)

基　　金：国家自然科学基金(61866018)。

摘　　要：文章旨在研究如何有效去除中文文本的毒性。针对此任务,文章重构了一个中文毒性语料集,以此作为任务研究的数据基础。基于此数据集文章探究了文本的毒性表现形式,同时对特定类别的毒性文本成因展开了分析。基于上述分析结果,文章使用基于编辑式、生成式两类文本风格迁移模型进行文本去毒,并进一步探究了大语言模型基于不同Prompt时去除文本毒性的表现。据实验结果表明,基于编辑式的模型能有效去除显式毒性文本的毒性,且具有较高的内容保存度,生成式模型生成的文本则有更高的流畅度。基于Prompt的大语言模型在一定程度上可以去除句子毒性,但相较于特定的风格迁移模型而言,小参数大语言模型的去毒能力还有待提高。The purpose of this paper was to study how to effectively remove the toxicity of Chinese texts.For this task,this paper re-constructed a Chinese texts toxicity corpus set,which was used as the data basis for task research.Based on this data set,this paper explored the toxic manifestations of texts,and analyzed the causes of specific types of toxic texts.Based on the analysis results above,this paper used two types of text style transfer models based on editing and generating to remove text toxicity,and further ex-plored the performance of removing text toxicity based on different Prompts in large language models.According to the experimen-tal results,the edited model can effectively remove the toxicity of explicit toxic text,and has a higher degree of content preservation,while the generated text has a higher degree of fluency.Prompt-based large language model can remove sentence toxicity to a certain extent,but compared with specific style transfer models,the detoxification ability of small parameter large language model needs to beimproved.

关键词：文本风格迁移文本去毒大语言模型

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

中文文本去毒任务的研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

中文文本去毒任务的研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索