基于CLIP微调的扩散模型安全化  

Purging diffusion models through CLIP based fine-tuning

在线阅读下载全文

作  者:吴平[1] 林欣[1] WU Ping;LIN Xin(School of Computer Science and Technology,East China Normal University,Shanghai 200062,China)

机构地区:[1]华东师范大学计算机科学与技术学院,上海200062

出  处:《华东师范大学学报(自然科学版)》2025年第1期138-150,共13页Journal of East China Normal University(Natural Science)

基  金:统计与数据科学前沿理论及应用教育部重点实验室开放项目;上海市科委项目(21511100101)。

摘  要:扩散模型变革了文本–图像生成领域,使终端用户可以基于简单的自然语言提示生成高质量、多样化的图像艺术作品.然而,由于训练数据集庞大且未经过滤,文本–图像生成模型具有生成色情内容与暴力内容等不适当内容的能力.为更加安全地部署此类模型,提出了一种基于CLIP (contrastive languageimage pre-training)方向性损失的微调(directional CLIP loss based fine-tuning, CLIF)算法,使用方向性的CLIP损失来微调模型,以抑制其生成不适当内容的能力. CLIF消耗的计算资源很少,并且具有强制生效的特点.为评估其抑制效果,提出了CTP (categorized toxic prompts)用于评估文本–图像生成模型的不适当内容生成能力.在CTP与COCO (common objects in context)上的实验结果表明, CLIF能够在抑制文本–图像扩散模型生成不安全内容的同时不影响其一般性生成能力.Diffusion models have revolutionized text-to-image synthesis,enabling users to generate highquality and imaginative artworks from simple natural-language text prompts.Unfortunately,due to the large and unfiltered training dataset,inappropriate content such as nudity and violence can be generated from them.To deploy such models at a higher level of safety,we propose a novel method,directional contrastive language-image pre-training(CLIP)loss-based fine-tuning,dubbed as CLIF.This method utilizes directional CLIP loss to suppress the model’s inappropriate generation ability.CLIF is lightweight and immune to circumvention.To demonstrate the effectiveness of CLIF,we proposed a benchmark called categorized toxic prompts(CTP)to evaluate the ability to generate inappropriate content for text-to-image diffusion models.As shown by our experiments on CTP and common objects in context(COCO)datasets,CLIF is capable of significantly suppressing inappropriate generation while preserving the model’s ability to produce general content.

关 键 词:文本–图像生成模型 安全性 数据集 扩散模型 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象