Audio-guided implicit neural representation for local imagestylization

作　　者：Seung Hyun Lee Sieun Kim Wonmin Byeon Gyeongrok Oh Sumin In Hyeongcheol Park Sang Ho Yoon Sung-Hee Hong Jinkyu Kim Sangpil Kim

机构地区：[1]Department of Artificial Intelligence,Korea University,Seoul 02841,Republic of Korea [2]NVIDIA Research,Nvidia Corporation,Santa Clara,CA 95051,USA [3]Graduate School of Culture Technology,KAIST,Seoul 34141,Republic of Korea [4]Hologram Research Center,Korea Electronics Technology Institute,Seoul 03924,Republic of Korea [5]Department of Computer Science and Engineering,Korea University,Seoul 02841,Republic of Korea

出　　处：《Computational Visual Media》2024年第6期1185-1204,共20页计算可视媒体(英文版)

基　　金：supported by the Culture,Sports and Tourism R&D Program through the Korea Creative Content Agency grant funded by the Ministry of Culture,Sports and Tourism in 2022-(4D Content Generation and Copyright Protection with Artificial Intelligence,R2022020068,30%;Research on neural watermark technology for copyright protection of generative AI 3D content,RS-2024-00348469,40%;International Collaborative Research and Global Talent Development for the Development of Copyright Management and Protection Technologies for Generative AI,RS-2024-00345025,10%);the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(RS-2019-II190079,10%,No.2017-0-00417,10%).

摘　　要：We present a novel framework for audio-guided localized image stylization.Sound often provides information about the specific context of a scene and is closely related to a certain part of the scene or object.However,existing image stylization works have focused on stylizing the entire image using an image or text input.Stylizing a particular part of the image based on audio input is natural but challenging.This work proposes a framework in which a user provides an audio input to localize the target in the input image and another to locally stylize the target object or scene.We first produce a fine localization map using an audio-visual localization network leveraging CLIP embedding space.We then utilize an implicit neural representation(INR)along with the predicted localization map to stylize the target based on sound information.The INR manipulates local pixel values to be semantically consistent with the provided audio input.Our experiments show that the proposed framework outperforms other audio-guided stylization methods.Moreover,we observe that our method constructs concise localization maps and naturally manipulates the target object or scene in accordance with the given audio input.

关键词：audio guidance image style transfer implicit neural representations(INR)

分类号：TP3[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Audio-guided implicit neural representation for local imagestylization

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Audio-guided implicit neural representation for local imagestylization

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索