检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高玄 刘东宇[1] 张举勇[1] Gao Xuan;Liu Dongyu;Zhang Juyong(Key Laboratory of Computer Graphics and Perception Interaction in Anhui Province,University of Science and Technology of China,Hefei 230026,China)
机构地区:[1]中国科学技术大学安徽省图形计算与感知交互重点实验室,合肥230026
出 处:《中国图象图形学报》2024年第9期2494-2512,共19页Journal of Image and Graphics
基 金:国家自然科学基金项目(62122071,62272433)。
摘 要:多模态数字人是指具备多模态认知与交互能力,且有类人的思维和行为逻辑的真实自然虚拟人。近年来随着计算机视觉与自然语言处理等领域的交叉融合以及蓬勃发展,相关技术取得显著进步。本文讨论在图形学和视觉领域比较重要的多模态人头动画、多模态人体动画以及多模态数字人形象构建3个主题,介绍其方法论和代表工作。在多模态人头动画主题下介绍语音驱动人头和表情驱动人头两个问题的相关工作。在多模态人体动画主题下介绍基于循环神经网络(recurrent neural networks,RNN)的、基于Transformer的和基于降噪扩散模型的人体动画生成。在多模态数字人形象构建主题下介绍视觉语言相似性引导的虚拟形象构建、基于多模态降噪扩散模型引导的虚拟形象构建以及三维多模态虚拟人生成模型。本文将相关方向的代表性工作进行介绍和归类,对已有方法进行总结,并展望未来可能的研究方向。A multimodal digital human refers to a digital avatar that can perform multimodal cognition and interaction and should be able to think and behave like a human being.Substantial progress has been made in related technologies due to cross-fertilization and vibrant development in various fields,such as computer vision and natural language processing.This article discusses three major themes in the areas of computer graphics and computer vision:multimodal head animation,multimodal body animation,and multimodal portrait creation.The methodologies and representative works in these areas are also introduced.Under the theme of multimodal head animation,this work presents the research on speech-and expression-driven head models.Under the theme of multimodal body animation,the paper explores techniques involving recurrent neural network(RNN)-,Transformer-,and denoising diffusion probabilistic model(DDPM)-based body animation.The discussion of multimodal portrait creation covers portrait creation guided by visual-linguistic similarity,portrait creation guided by multimodal denoising diffusion model,and three-dimensional(3D)multimodal generative models on digital portraits.Further,this article provides an overview and classification of representative works in these research directions,summarizes existing methods,and points out potential future research directions.This article delves into key directions in the field of multimodal digital humans and covers multimodal head animation,multimodal body animation,and the construction of multimodal digital human representations.In the realm of multimodal head animation,we extensively explore two major tasks:expression-and speech-driven animation.For explicit and implicit parameterized models for expression-driven head animation,mesh surfaces and neural radiance fields(NeRF)are used to improve the rendering effects.Explicit models employ 3D morphable and linear models but encounter challenges,such as weak expressive capac⁃ity,nondifferentiable rendering,and difficult modeling of pe
关 键 词:虚拟数字人建模 多模态角色动画 多模态生成与编辑 神经渲染 生成模型 神经隐式表示
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.189.141.66