多模态数字人建模、合成与驱动综述

Multi-modal digital human modeling,synthesis,and driving:a survey

作　　者：高玄刘东宇[1] 张举勇[1] Gao Xuan;Liu Dongyu;Zhang Juyong(Key Laboratory of Computer Graphics and Perception Interaction in Anhui Province,University of Science and Technology of China,Hefei 230026,China)

机构地区：[1]中国科学技术大学安徽省图形计算与感知交互重点实验室,合肥230026

出　　处：《中国图象图形学报》2024年第9期2494-2512,共19页Journal of Image and Graphics

基　　金：国家自然科学基金项目(62122071,62272433)。

摘　　要：多模态数字人是指具备多模态认知与交互能力,且有类人的思维和行为逻辑的真实自然虚拟人。近年来随着计算机视觉与自然语言处理等领域的交叉融合以及蓬勃发展,相关技术取得显著进步。本文讨论在图形学和视觉领域比较重要的多模态人头动画、多模态人体动画以及多模态数字人形象构建3个主题,介绍其方法论和代表工作。在多模态人头动画主题下介绍语音驱动人头和表情驱动人头两个问题的相关工作。在多模态人体动画主题下介绍基于循环神经网络(recurrent neural networks,RNN)的、基于Transformer的和基于降噪扩散模型的人体动画生成。在多模态数字人形象构建主题下介绍视觉语言相似性引导的虚拟形象构建、基于多模态降噪扩散模型引导的虚拟形象构建以及三维多模态虚拟人生成模型。本文将相关方向的代表性工作进行介绍和归类,对已有方法进行总结,并展望未来可能的研究方向。A multimodal digital human refers to a digital avatar that can perform multimodal cognition and interaction and should be able to think and behave like a human being.Substantial progress has been made in related technologies due to cross-fertilization and vibrant development in various fields,such as computer vision and natural language processing.This article discusses three major themes in the areas of computer graphics and computer vision:multimodal head animation,multimodal body animation,and multimodal portrait creation.The methodologies and representative works in these areas are also introduced.Under the theme of multimodal head animation,this work presents the research on speech-and expression-driven head models.Under the theme of multimodal body animation,the paper explores techniques involving recurrent neural network(RNN)-,Transformer-,and denoising diffusion probabilistic model(DDPM)-based body animation.The discussion of multimodal portrait creation covers portrait creation guided by visual-linguistic similarity,portrait creation guided by multimodal denoising diffusion model,and three-dimensional(3D)multimodal generative models on digital portraits.Further,this article provides an overview and classification of representative works in these research directions,summarizes existing methods,and points out potential future research directions.This article delves into key directions in the field of multimodal digital humans and covers multimodal head animation,multimodal body animation,and the construction of multimodal digital human representations.In the realm of multimodal head animation,we extensively explore two major tasks:expression-and speech-driven animation.For explicit and implicit parameterized models for expression-driven head animation,mesh surfaces and neural radiance fields(NeRF)are used to improve the rendering effects.Explicit models employ 3D morphable and linear models but encounter challenges,such as weak expressive capac⁃ity,nondifferentiable rendering,and difficult modeling of pe

关键词：虚拟数字人建模多模态角色动画多模态生成与编辑神经渲染生成模型神经隐式表示

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多模态数字人建模、合成与驱动综述

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多模态数字人建模、合成与驱动综述

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索