检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘奕 金小峰[1] LIU Yi;JIN Xiaofeng(College of Engineering,Yanbian University,Yanji 133002,China)
出 处:《延边大学学报(自然科学版)》2020年第3期215-220,共6页Journal of Yanbian University(Natural Science Edition)
基 金:吉林省教育厅“十三五”科学技术项目(JJKH20191126KJ);延边大学世界一流学科建设培育项目(18YLPY14)。
摘 要:针对人脸动画技术中的面部特征与语音特征的映射问题,提出了一种基于双向长短时记忆网络(Bi-LSTM)的映射模型学习方法.首先,在训练视频中同步地分别提取语音信号的MFCC参数和视频帧序列中的人脸特征点参数.其次,训练映射模型过程中将MFCC参数作为Bi-LSTM网络的输入,将面部特征参数作为网络的期望输出,并引入参数调优机制对迭代次数、隐层单元数、批处理大小、优化器类型等进行实验调优,以此得到最优的映射模型.对最优映射模型进行实验结果表明,采用双向Bi-LSTM网络明显优于单向的LSTM网络,而且经过参数调优后映射准确率达到0.895;因此,本文方法可以为后续的基于语音驱动的人脸视频合成应用提供有效的人脸特征预测参数.Aiming the issue of mapping model between facial features and speech features in face animation technology,a mapping model learning method based on Bi-LSTM is proposed.Firstly,both MFCC parameters of speech and the facial landmark parameters of video frame are extracted concurrently from training video clips.Secondly,the mapping model is converged gradually by iterative training process with inputting MFCC parameters to B i-LSTM network and expecting the corresponding facial landmark parameters as output.In the meantime,approaches of fine-tune is applied to obtain best mapping model by experimental method,such as best epoch times,number of hidden layers,batch size and type of optimizer.The best mapping model experimental results show that Bi-LSTM is significantly better than LSTM,and the mapping accuracy reaches 0.895 after parameter fine-tuning.Therefore,the proposed method can provide effective facial predictive landmark parameters for applications of speech-driven face video synthesis.
关 键 词:人脸动画 梅尔频率倒谱系数 双向长短时记忆网络 参数调优
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43