基于参数迁移和C-LSTM的说话人识别研究  被引量:1

Speaker Recognition Based on Parameter Migration and C-LSTM

在线阅读下载全文

作  者:南兆营 NAN Zhaoying(Criminal Investigation Police University of China,Shenyang 110854,China)

机构地区:[1]中国刑事警察学院,辽宁沈阳110854

出  处:《电声技术》2020年第11期37-41,44,共6页Audio Engineering

基  金:国家重点研发计划(No.2017YFC0821000)。

摘  要:在说话人识别研究中,现有的深度学习方法大多只考虑了语音的空间特征或时序特征,且模型训练时间长、识别准确率低。语谱图是语音信号转换后在时频两域均具有独立特征的特殊图像。为了充分提取语谱图时频两域的情感特征,结合卷积神经网络(Convolutional Neural Networks,CNN)和长短时记忆(Long Short-Term Memory,LSTM)网络的特点,提出了一种基于参数迁移和C-LSTM的说话人识别方法。该方法以语谱图作为网络输入,利用CNN进行训练得到预训练模型并迁移参数,之后将CNN输出的特征矩阵进行转换后输入LSTM进行训练。实验结果表明,该方法提高了声纹识别的准确率,并加快了网络的收敛速度。In the study of speaker recognition,most of the existing deep learning methods only consider the spatial or temporal features of speech,and the model has long training time and low recognition accuracy.The spectrogram is a special image with independent features in both time and frequency domains after the conversion of speech signals.In order to fully extract the emotional features of the time and frequency domains of the speech spectrum map,combined with the characteristics of Convolution Neural Network(CNN)and Long Short-Term Memory(LSTM)network,a speaker recognition method based on parameter transfer and C-LSTM is proposed.This method takes the speech spectrum map as the input of the network,first enters the CNN to train to obtain the pre-training model and migrate the parameters,then converts the characteristic matrix of the CNN output into the LSTM to train.The experimental results show that this method improves the accuracy of voiceprint recognition and accelerates the convergence rate of the network.

关 键 词:语谱图 参数迁移 说话人识别 长短时记忆网络 卷积神经网络 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象