多模态信息引导的三维数字人运动生成综述

A survey on multimodal information-guided 3D human motion generation

作　　者：赵宝全付一愉苏卓[2] 王若梅[2] 吕辰雷罗笑南 Zhao Baoquan;Fu Yiyu;Su Zhuo;Wang Ruomei;Lyu Chenlei;Luo Xiaonan(School of Artificial Intelligence,Sun Yat-sen University,Zhuhai 519000,China;School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006,China;College of Computer Science and Software Engineering,Shenzhen University,Shenzhen 518060,China;School of Computer and Information Security,Guilin University of Electronic Science and Technology,Guilin 541004,China)

机构地区：[1]中山大学人工智能学院,珠海519000 [2]中山大学计算机学院,广州510006 [3]深圳大学计算机与软件学院,深圳518060 [4]桂林电子科技大学计算机与信息安全学院,桂林541004

出　　处：《中国图象图形学报》2024年第9期2541-2565,共25页Journal of Image and Graphics

基　　金：国家重点研发计划资助(2022YFF0903103);广东省自然科学基金项目(2023A1515011639);中央高校基本科研业务费专项资金资助(23xkjc019,24qnpy145)。

摘　　要：基于多模态信息的三维数字人运动生成技术旨在通过文本、音频、图像和视频等数据实现特定输入条件下的人体运动生成。这项技术在电影、动画、游戏制作和元宇宙等领域具有重要的应用价值和广泛的经济社会效益,是近年来计算机图形学和计算机视觉等领域研究的热点问题之一。然而,基于多模态信息的三维数字人运动生成面临着诸多挑战,包括跨模态信息的表征和融合困难、高质量数据集缺乏、生成的运动质量较差(如抖动、穿模和脚部滑动等)以及生成效率低等问题。虽然近年来研究者们提出了各式各样的解决方案来应对上述挑战,但如何根据不同模态数据的特点实现高效、高质量的三维数字人运动生成仍然是一个开放性问题。本文以数字人运动生成所采用的模型架构为分类标准,将现有的主流方法分为基于生成对抗网络(generative adversarial network,GAN)的方法、基于自编码器(autoencoder,AE)的方法、基于变分自编码器(variational autoencoder,VAE)的方法以及基于扩散模型的方法,总结并形成了一种数字人运动生成通用框架。本文还介绍了该领域常见的参数化人体模型、数据集以及评估指标。对于一些具有代表性的工作,本文在一些常用数据集上进行了对比实验,评估这些方法的性能表现。最后综合现有的数据集、算法和代表性研究,总结了该领域的问题和挑战,探讨了完善数据集、优化运动质量和多样性、融合跨模态信息和提高生成效率等潜在的研究方向。Three-dimensional(3D)digital human motion generation guided by multimodal information generates human motion under specific input conditions through data,such as text,audio,image,and video.This technology has a wide spectrum of applications and extensive economic and social benefits in the fields of film,animation,game production,metaverse,etc.,and is one of the research hotspots in the fields of computer graphics and computer vision.However,such a task faces grand challenges,including the difficult representation and fusion of multimodal information,lack of highquality datasets,poor quality of generated motion(such as jitter,penetration,and foot sliding),and low generation effi⁃ciency.Although various solutions have been proposed to address the aforementioned challenges,a mechanism for achiev⁃ing efficient and high-quality 3D digital human motion generation based on the characteristics of distinct modal data remains an open problem to be solved.This paper comprehensively reviews 3D digital human motion generation and elabo⁃rates on related recent advances from the perspectives of parametrized 3D human models,human motion representation,motion generation techniques,motion analysis and editing,existing human motion datasets and evaluation metrics.Param⁃etrized human models facilitate digital human modeling and motion generation through the provision of parameters associ⁃ated with body shapes and postures and serve as key pillars of current digital human research and applications.This survey begins with an introduction to widely used parametrized 3D human body models,including shape completion and animation of people(SCAPE),skinned multi-person linear model(SMPL),SMPL-X,and SMPL-H,and their detailed comparison in terms of model representations and the parameters used to control body shapes,poses,and facial expressions.Human motion representation is a core issue in digital human motion generation.This work highlights the musculoskeletal model and classic skinning algorithms,including linear blending skinni

关键词：三维数字人运动生成多模态信息参数化人体模型生成对抗网络(GAN) 自编码器(AE) 变分自编码器(VAE) 扩散模型

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多模态信息引导的三维数字人运动生成综述

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多模态信息引导的三维数字人运动生成综述

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索