检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Seung Hyun Lee Sieun Kim Wonmin Byeon Gyeongrok Oh Sumin In Hyeongcheol Park Sang Ho Yoon Sung-Hee Hong Jinkyu Kim Sangpil Kim
机构地区:[1]Department of Artificial Intelligence,Korea University,Seoul 02841,Republic of Korea [2]NVIDIA Research,Nvidia Corporation,Santa Clara,CA 95051,USA [3]Graduate School of Culture Technology,KAIST,Seoul 34141,Republic of Korea [4]Hologram Research Center,Korea Electronics Technology Institute,Seoul 03924,Republic of Korea [5]Department of Computer Science and Engineering,Korea University,Seoul 02841,Republic of Korea
出 处:《Computational Visual Media》2024年第6期1185-1204,共20页计算可视媒体(英文版)
基 金:supported by the Culture,Sports and Tourism R&D Program through the Korea Creative Content Agency grant funded by the Ministry of Culture,Sports and Tourism in 2022-(4D Content Generation and Copyright Protection with Artificial Intelligence,R2022020068,30%;Research on neural watermark technology for copyright protection of generative AI 3D content,RS-2024-00348469,40%;International Collaborative Research and Global Talent Development for the Development of Copyright Management and Protection Technologies for Generative AI,RS-2024-00345025,10%);the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(RS-2019-II190079,10%,No.2017-0-00417,10%).
摘 要:We present a novel framework for audio-guided localized image stylization.Sound often provides information about the specific context of a scene and is closely related to a certain part of the scene or object.However,existing image stylization works have focused on stylizing the entire image using an image or text input.Stylizing a particular part of the image based on audio input is natural but challenging.This work proposes a framework in which a user provides an audio input to localize the target in the input image and another to locally stylize the target object or scene.We first produce a fine localization map using an audio-visual localization network leveraging CLIP embedding space.We then utilize an implicit neural representation(INR)along with the predicted localization map to stylize the target based on sound information.The INR manipulates local pixel values to be semantically consistent with the provided audio input.Our experiments show that the proposed framework outperforms other audio-guided stylization methods.Moreover,we observe that our method constructs concise localization maps and naturally manipulates the target object or scene in accordance with the given audio input.
关 键 词:audio guidance image style transfer implicit neural representations(INR)
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:13.58.238.63