Audio-guided implicit neural representation for local imagestylization  

在线阅读下载全文

作  者:Seung Hyun Lee Sieun Kim Wonmin Byeon Gyeongrok Oh Sumin In Hyeongcheol Park Sang Ho Yoon Sung-Hee Hong Jinkyu Kim Sangpil Kim 

机构地区:[1]Department of Artificial Intelligence,Korea University,Seoul 02841,Republic of Korea [2]NVIDIA Research,Nvidia Corporation,Santa Clara,CA 95051,USA [3]Graduate School of Culture Technology,KAIST,Seoul 34141,Republic of Korea [4]Hologram Research Center,Korea Electronics Technology Institute,Seoul 03924,Republic of Korea [5]Department of Computer Science and Engineering,Korea University,Seoul 02841,Republic of Korea

出  处:《Computational Visual Media》2024年第6期1185-1204,共20页计算可视媒体(英文版)

基  金:supported by the Culture,Sports and Tourism R&D Program through the Korea Creative Content Agency grant funded by the Ministry of Culture,Sports and Tourism in 2022-(4D Content Generation and Copyright Protection with Artificial Intelligence,R2022020068,30%;Research on neural watermark technology for copyright protection of generative AI 3D content,RS-2024-00348469,40%;International Collaborative Research and Global Talent Development for the Development of Copyright Management and Protection Technologies for Generative AI,RS-2024-00345025,10%);the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(RS-2019-II190079,10%,No.2017-0-00417,10%).

摘  要:We present a novel framework for audio-guided localized image stylization.Sound often provides information about the specific context of a scene and is closely related to a certain part of the scene or object.However,existing image stylization works have focused on stylizing the entire image using an image or text input.Stylizing a particular part of the image based on audio input is natural but challenging.This work proposes a framework in which a user provides an audio input to localize the target in the input image and another to locally stylize the target object or scene.We first produce a fine localization map using an audio-visual localization network leveraging CLIP embedding space.We then utilize an implicit neural representation(INR)along with the predicted localization map to stylize the target based on sound information.The INR manipulates local pixel values to be semantically consistent with the provided audio input.Our experiments show that the proposed framework outperforms other audio-guided stylization methods.Moreover,we observe that our method constructs concise localization maps and naturally manipulates the target object or scene in accordance with the given audio input.

关 键 词:audio guidance image style transfer implicit neural representations(INR) 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象