基于不同语音情绪的三维人脸动画控制参数预测

Prediction of Control Parameters for 3D Face Animation Based on Different Speech Emotions

作　　者：杨静[1] YANG Jing(School of Information Engineering,Anhui Business and Technology College,Hefei 231131,China)

出　　处：《南京工程学院学报（自然科学版）》2023年第4期23-29,共7页Journal of Nanjing Institute of Technology(Natural Science Edition)

基　　金：安徽省高校自然科学研究项目(KJ2021A1511)。

摘　　要：为提高三维人脸动画的控制精度,设计一种基于不同语音情绪的映射网络,预测三维人脸控制参数.对语音信号进行处理以生成语谱图;针对频域特征提取子网络和时频特征提取子网络,以卷积神经网络为架构融入通道注意力机制,强调语音情绪的特征提取能力;采用多轮交替运算的Mogrifier LSTM替换BiLSTM,强化前后语音情绪与人脸控制参数的对应关系,提高时序关联性.不同方法试验结果表明,本文设计方法能够实现不同情绪、不同人的语音情绪识别和三维人脸控制参数预测,相比于其他4种方法,在数据集的平均误差分别降低了23.9%、40.6%、13.4%和6.0%,在8种不同情绪中,本文方法的平均误差比融合CNN与BiLSTM方法降低了5.4%,在保证较高的时间平滑和控制参数预测精度的同时,进一步加强了三维人脸动画的流畅度和真实度.In order to improve the control accuracy of 3D face animation,a mapping network based on different speech emotions was designed to predict 3D face control parameters.Speech signals were processed to generate speech spectrograms.For the feature extraction sub-networks of frequency-domain and time-domain,the convolutional neural network(CNN)was used as the architecture,and the channel attention mechanism was incorporated to emphasize the feature extraction ability of speech emotion.Mogrifier long-and short-term memory(Mogrifier LSTM)with multiple rounds of alternating operations was used to replace bidirectional long-and short-term memory(BiLSTM)to reinforce the correspondence between pre-and post-textual speech emotions and face control parameters,and to improve temporal correlation.The results showed that the incorporated CNN SENet and M-BiLSTM(CSEM)designed in this study was able to realize speech emotion recognition and 3D face control parameter prediction for different emotions and people.Compared to the other four methods,CSEM reduced the mean error in the dataset by 23.9%,40.6%,13.4%,and 6.0%,respectively;Across eight different emotions,Average of CSEM error is 5.4%lower than that of CBLM.While ensuring high temporal smoothness and control parameter prediction accuracy,the fluidity and realism of 3D face animation is enhanced by CSEM.

关键词：语音情绪三维人脸动画控制参数预测通道注意力机制表情细节

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于不同语音情绪的三维人脸动画控制参数预测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于不同语音情绪的三维人脸动画控制参数预测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索