基于Bi-LSTM的面部特征与语音特征的映射模型  被引量:1

A mapping model of facial features and speech features based on Bi-LSTM

在线阅读下载全文

作  者:刘奕 金小峰[1] LIU Yi;JIN Xiaofeng(College of Engineering,Yanbian University,Yanji 133002,China)

机构地区:[1]延边大学工学院,吉林延吉133002

出  处:《延边大学学报(自然科学版)》2020年第3期215-220,共6页Journal of Yanbian University(Natural Science Edition)

基  金:吉林省教育厅“十三五”科学技术项目(JJKH20191126KJ);延边大学世界一流学科建设培育项目(18YLPY14)。

摘  要:针对人脸动画技术中的面部特征与语音特征的映射问题,提出了一种基于双向长短时记忆网络(Bi-LSTM)的映射模型学习方法.首先,在训练视频中同步地分别提取语音信号的MFCC参数和视频帧序列中的人脸特征点参数.其次,训练映射模型过程中将MFCC参数作为Bi-LSTM网络的输入,将面部特征参数作为网络的期望输出,并引入参数调优机制对迭代次数、隐层单元数、批处理大小、优化器类型等进行实验调优,以此得到最优的映射模型.对最优映射模型进行实验结果表明,采用双向Bi-LSTM网络明显优于单向的LSTM网络,而且经过参数调优后映射准确率达到0.895;因此,本文方法可以为后续的基于语音驱动的人脸视频合成应用提供有效的人脸特征预测参数.Aiming the issue of mapping model between facial features and speech features in face animation technology,a mapping model learning method based on Bi-LSTM is proposed.Firstly,both MFCC parameters of speech and the facial landmark parameters of video frame are extracted concurrently from training video clips.Secondly,the mapping model is converged gradually by iterative training process with inputting MFCC parameters to B i-LSTM network and expecting the corresponding facial landmark parameters as output.In the meantime,approaches of fine-tune is applied to obtain best mapping model by experimental method,such as best epoch times,number of hidden layers,batch size and type of optimizer.The best mapping model experimental results show that Bi-LSTM is significantly better than LSTM,and the mapping accuracy reaches 0.895 after parameter fine-tuning.Therefore,the proposed method can provide effective facial predictive landmark parameters for applications of speech-driven face video synthesis.

关 键 词:人脸动画 梅尔频率倒谱系数 双向长短时记忆网络 参数调优 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象