检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:南兆营 NAN Zhaoying(Criminal Investigation Police University of China,Shenyang 110854,China)
出 处:《电声技术》2020年第11期37-41,44,共6页Audio Engineering
基 金:国家重点研发计划(No.2017YFC0821000)。
摘 要:在说话人识别研究中,现有的深度学习方法大多只考虑了语音的空间特征或时序特征,且模型训练时间长、识别准确率低。语谱图是语音信号转换后在时频两域均具有独立特征的特殊图像。为了充分提取语谱图时频两域的情感特征,结合卷积神经网络(Convolutional Neural Networks,CNN)和长短时记忆(Long Short-Term Memory,LSTM)网络的特点,提出了一种基于参数迁移和C-LSTM的说话人识别方法。该方法以语谱图作为网络输入,利用CNN进行训练得到预训练模型并迁移参数,之后将CNN输出的特征矩阵进行转换后输入LSTM进行训练。实验结果表明,该方法提高了声纹识别的准确率,并加快了网络的收敛速度。In the study of speaker recognition,most of the existing deep learning methods only consider the spatial or temporal features of speech,and the model has long training time and low recognition accuracy.The spectrogram is a special image with independent features in both time and frequency domains after the conversion of speech signals.In order to fully extract the emotional features of the time and frequency domains of the speech spectrum map,combined with the characteristics of Convolution Neural Network(CNN)and Long Short-Term Memory(LSTM)network,a speaker recognition method based on parameter transfer and C-LSTM is proposed.This method takes the speech spectrum map as the input of the network,first enters the CNN to train to obtain the pre-training model and migrate the parameters,then converts the characteristic matrix of the CNN output into the LSTM to train.The experimental results show that this method improves the accuracy of voiceprint recognition and accelerates the convergence rate of the network.
关 键 词:语谱图 参数迁移 说话人识别 长短时记忆网络 卷积神经网络
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.226.47