细粒度文本引导的跨模态风格迁移  

Fine-granularity Text-Guided Cross-modality Style Transfer

在线阅读下载全文

作  者:孙世昶 魏爽 孟佳娜 林鸿飞[2] 肖文浩 刘爽 SUN Shichang;WEI Shuang;MENG Jiana;LIN Hongfei;XIAO Wenhao;LIU Shuang(School of Computer Science and Engineering,Dalian Minzu University,Dalian,Liaoning 116600,China;School of Computer Science and Technology,Dalian University of Technology,Dalian,Liaoning 116024,China)

机构地区:[1]大连民族大学计算机科学与工程学院,辽宁大连116600 [2]大连理工大学计算机科学与技术学院,辽宁大连116024

出  处:《中文信息学报》2024年第12期170-180,共11页Journal of Chinese Information Processing

基  金:国家自然科学基金(61876031,62076046)。

摘  要:借助于StyleGANs的解纠缠表示和多模态预训练模型中不同模态之间的语义对应关系,现有方法在跨模态风格迁移领域取得了较好的结果。然而,基于图像尺度分解的StyleGANs的潜在空间不利于局部属性的编辑,这会造成在迁移时对无关部分的干扰。该文提出细粒度文本引导的跨模态风格迁移模型,通过利用文本中包含的区域信息来实现局部可控的风格迁移。首先,通过基于BERT的文本语义分类网络对目标风格文本包含的语义区域进行定位,然后利用特征映射网络将目标文本的CLIP特征嵌入到SemanticStyleGAN的潜在空间。文本语义分类网络和特征映射网络的结合使得目标文本的CLIP特征细粒度地嵌入到可编辑的潜在空间。最后通过对生成的风格化图像进行随机透视增强来解决训练中的对抗生成问题。实验表明,该方法能够生成更贴近文本描述风格的图像,并提高了跨模态编辑的区域准确性。By utilizing the disentanglement representation of StyleGANs and the semantic correspondence between different modalities in multimodal pre-trained model,existing methods have achieved good results in cross-modality style transfer.However,the latent space of StyleGANs based on image scale decomposition is not conducive to editing local attributes,which can cause interference with irrelevant parts during transferring.We propose fine-granularity text guided cross-modality style transfer model,achieving locally controllable style transfer by utilizing regional information in prompt text.Firstly,a text semantic classification network based on BERT is used to locate the semantic regions contained in the target style text.Then a feature mapping network is used to embed the CLIP features of the target text into the latent space of SemanticStyleGAN.The combination of text semantic classification network and feature mapping network enables fine-granularity embedding of CLIP features of the target text into editable potential spaces.Finally,the adversarial generation problem during training is solved by randomly augmenting the generated stylized images through perspective views.The experiment shows that the method proposed in this paper can generate images that are more closely related to the prompt text style and improve the regional accuracy of cross-modality editing.

关 键 词:风格迁移 多模态预训练模型 文本语义分类 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象