检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙世昶 魏爽 孟佳娜 林鸿飞[2] 肖文浩 刘爽 SUN Shichang;WEI Shuang;MENG Jiana;LIN Hongfei;XIAO Wenhao;LIU Shuang(School of Computer Science and Engineering,Dalian Minzu University,Dalian,Liaoning 116600,China;School of Computer Science and Technology,Dalian University of Technology,Dalian,Liaoning 116024,China)
机构地区:[1]大连民族大学计算机科学与工程学院,辽宁大连116600 [2]大连理工大学计算机科学与技术学院,辽宁大连116024
出 处:《中文信息学报》2024年第12期170-180,共11页Journal of Chinese Information Processing
基 金:国家自然科学基金(61876031,62076046)。
摘 要:借助于StyleGANs的解纠缠表示和多模态预训练模型中不同模态之间的语义对应关系,现有方法在跨模态风格迁移领域取得了较好的结果。然而,基于图像尺度分解的StyleGANs的潜在空间不利于局部属性的编辑,这会造成在迁移时对无关部分的干扰。该文提出细粒度文本引导的跨模态风格迁移模型,通过利用文本中包含的区域信息来实现局部可控的风格迁移。首先,通过基于BERT的文本语义分类网络对目标风格文本包含的语义区域进行定位,然后利用特征映射网络将目标文本的CLIP特征嵌入到SemanticStyleGAN的潜在空间。文本语义分类网络和特征映射网络的结合使得目标文本的CLIP特征细粒度地嵌入到可编辑的潜在空间。最后通过对生成的风格化图像进行随机透视增强来解决训练中的对抗生成问题。实验表明,该方法能够生成更贴近文本描述风格的图像,并提高了跨模态编辑的区域准确性。By utilizing the disentanglement representation of StyleGANs and the semantic correspondence between different modalities in multimodal pre-trained model,existing methods have achieved good results in cross-modality style transfer.However,the latent space of StyleGANs based on image scale decomposition is not conducive to editing local attributes,which can cause interference with irrelevant parts during transferring.We propose fine-granularity text guided cross-modality style transfer model,achieving locally controllable style transfer by utilizing regional information in prompt text.Firstly,a text semantic classification network based on BERT is used to locate the semantic regions contained in the target style text.Then a feature mapping network is used to embed the CLIP features of the target text into the latent space of SemanticStyleGAN.The combination of text semantic classification network and feature mapping network enables fine-granularity embedding of CLIP features of the target text into editable potential spaces.Finally,the adversarial generation problem during training is solved by randomly augmenting the generated stylized images through perspective views.The experiment shows that the method proposed in this paper can generate images that are more closely related to the prompt text style and improve the regional accuracy of cross-modality editing.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3