检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨静[1] YANG Jing(School of Information Engineering,Anhui Business and Technology College,Hefei 231131,China)
机构地区:[1]安徽工商职业学院信息工程学院,安徽合肥231131
出 处:《南京工程学院学报(自然科学版)》2023年第4期23-29,共7页Journal of Nanjing Institute of Technology(Natural Science Edition)
基 金:安徽省高校自然科学研究项目(KJ2021A1511)。
摘 要:为提高三维人脸动画的控制精度,设计一种基于不同语音情绪的映射网络,预测三维人脸控制参数.对语音信号进行处理以生成语谱图;针对频域特征提取子网络和时频特征提取子网络,以卷积神经网络为架构融入通道注意力机制,强调语音情绪的特征提取能力;采用多轮交替运算的Mogrifier LSTM替换BiLSTM,强化前后语音情绪与人脸控制参数的对应关系,提高时序关联性.不同方法试验结果表明,本文设计方法能够实现不同情绪、不同人的语音情绪识别和三维人脸控制参数预测,相比于其他4种方法,在数据集的平均误差分别降低了23.9%、40.6%、13.4%和6.0%,在8种不同情绪中,本文方法的平均误差比融合CNN与BiLSTM方法降低了5.4%,在保证较高的时间平滑和控制参数预测精度的同时,进一步加强了三维人脸动画的流畅度和真实度.In order to improve the control accuracy of 3D face animation,a mapping network based on different speech emotions was designed to predict 3D face control parameters.Speech signals were processed to generate speech spectrograms.For the feature extraction sub-networks of frequency-domain and time-domain,the convolutional neural network(CNN)was used as the architecture,and the channel attention mechanism was incorporated to emphasize the feature extraction ability of speech emotion.Mogrifier long-and short-term memory(Mogrifier LSTM)with multiple rounds of alternating operations was used to replace bidirectional long-and short-term memory(BiLSTM)to reinforce the correspondence between pre-and post-textual speech emotions and face control parameters,and to improve temporal correlation.The results showed that the incorporated CNN SENet and M-BiLSTM(CSEM)designed in this study was able to realize speech emotion recognition and 3D face control parameter prediction for different emotions and people.Compared to the other four methods,CSEM reduced the mean error in the dataset by 23.9%,40.6%,13.4%,and 6.0%,respectively;Across eight different emotions,Average of CSEM error is 5.4%lower than that of CBLM.While ensuring high temporal smoothness and control parameter prediction accuracy,the fluidity and realism of 3D face animation is enhanced by CSEM.
关 键 词:语音情绪 三维人脸动画 控制参数预测 通道注意力机制 表情细节
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.221.185.190