情感可控的个性化完整三维虚拟形象表情动画生成  被引量:1

Emotion-Controlled Personalized and Complete 3D Avatar Expression Animation Generation

作  者:李俊沂 庞德龙 蔡明旭 周圣喻 余旻婧 LI Junyi;PANG Delong;CAI Mingxu;ZHOU Shengyu;YU Minjing(College of Intelligence and Computing,Tianjin University,Tianjin 300350,China)

机构地区:[1]天津大学智能与计算学部,天津300350

出  处:《信号处理》2025年第2期382-398,共17页Journal of Signal Processing

基  金:国家自然科学基金(62002258);北京市自然科学基金(L222113)。

摘  要:语音驱动的三维虚拟形象情感表情动画,旨在合成与输入语音具有同步嘴唇动作和面部表情的三维人脸动画。然而,现有方法受限于三维人脸先验,在合成具有口腔内部结构的三维人脸动画方面存在一定的局限性,导致最终生成结果缺乏真实感。此外,现有多数方法往往重点关注虚拟形象唇部动作与语音的同步,而较少关注语音情感变化对面部表情的影响,使得生成的表情动画不够自然,真实感受到限制,影响了用户体验。针对以上问题,本文提出了一种情感可控的个性化完整三维虚拟形象表情动画生成方法,以生成具有完整口腔结构和丰富情感表情的人脸动画,提高三维虚拟形象的真实感。该方法由三个核心模块组成:具有完整口腔结构的中性表情动画生成模块、表情检索模块和表情融合模块。具有完整口腔结构的中性表情动画生成模块首先通过基于Transformer的自回归模型实现语音到三维人脸动画序列的跨模态映射,输出中性人脸动画序列,并通过交叉监督的训练图,引入了文本驱动的一致性损失,确保了输入语音与嘴唇区域的同步性。接着,本文在该模块中提出并应用了一种基于人脸关键点的口腔结构三维模型形变算法,依次将生成的口腔模型与对应的中性人脸动画序列进行融合,输出包含口腔结构的中性表情模型序列。表情检索模块根据输入的语音序列和人脸图片进行情感识别和检索,获取带有情感的三维人脸模型。表情融合模块通过深度神经网络将包含口腔结构的中性表情动画与带有情感的三维人脸模型融合,生成具有口腔结构与情感表情的三维人脸表情动画。此外,本文还提出了一种基于线性插值的表情过渡算法实现了表情动画在多种情绪间的平滑过渡。现有实验表明,本文生成的包含口腔结构且具有情感表情的三维人脸动画均能在保持唇部动作与Speech-driven emotional expression animation of 3D avatars aims to generate 3D facial animations that not only feature synchronized lip movements with the input voice data but also convey a range of emotional expressions.However,owing to the limitations of 3D face prior,existing methods have some limitations in the synthesis of 3D facial animations with internal oral structures,resulting in a lack of realism in the final result.In addition,despite the advancements in this field,the majority of existing research predominantly focuses on the synchronization of lip movements and spoken words in 3D avatars,and insufficient attention is given to the significant role that emotional fluctuations play in shaping facial expressions.This limitation makes the generated expression animation not sufficiently natural,and the realism of the 3D facial animation is limited,which affects users'feelings.To solve these problems,this paper proposes an emotion-controlled personalized and complete 3D avatar expression animation generation method to generate facial animation containing a detailed representation of the inner oral structure while including a wide array of emotional expressions to improve the realism of 3D facial animations.The method consists of three core modules:neutral expression animation with complete oral structure generation,expression retrieval,and expression fusion.The neutral expression animation with complete oral structure generation module first outputs a neutral expression animation sequence,which achieves cross-modal mapping from speech to a 3D facial animation sequence using an auto-regressive model based on Transformer and introduces text-driven consistency loss through cross-supervised training graph to ensure synchronization between input speech and the lip region.This paper proposes an oral structure 3D model deformation algorithm based on the landmarks of the face in this module.This algorithm enables the dynamic deformation of the oral structure model,which is then seamlessly fused with the correspon

关 键 词:语音驱动 情绪驱动 三维虚拟形象 面部表情动画 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象