基于改进语谱图的深度学习说话人识别  被引量:1

Deep learning speaker recognition based on improved spectrogram

在线阅读下载全文

作  者:马志举 杜庆治[1] 龙华[1] 邵玉斌[1] MA Zhiju;DU Qingzhi;LONG Hua;SHAO Yubin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,云南昆明650500

出  处:《现代电子技术》2023年第21期32-38,共7页Modern Electronics Technique

摘  要:为了提高说话人识别系统的性能,提出基于改进语谱图的深度学习说话人识别算法。语谱图当中包含了语音的内容、情绪、语种以及说话人身份等多种信息,在以往的说话人识别算法中,往往没有考虑到说话人身份特性,采用直接提取语音中的语谱图作为网络输入,而说话人识别系统中需要提取语谱图中表征身份的信息,因此需要在原始语谱图的基础上进行改进。在语谱图中,基音频率以及共振峰等信息最能表现说话人的身份特征,从而提出根据语音信号中每一帧的基音频率进行自适应梳状滤波,得到改进后的语谱图,再通过卷积神经网络提取说话人特征,从而达到提升识别准确率的效果。网络模型采用MobileNetv2神经网络,该网络模型具有模型参数少、收敛速度快、识别速度快等优点,有利于实际应用。在对照实验结果中,该方法相对于原始语谱图的准确率分别提高了2.3%、5.2%、3%。A deep learning speaker recognition algorithm based on improved speech spectrum is proposed to improve the performance of speaker recognition system.The spectrogram contains a variety of information such as speech content,emotion,language and speaker identity.In the existing speaker recognition algorithms,the characteristics of speaker identity are often not taken into account,and the spectrogram is directly extracted from the speech and taken as the network input,while the speaker recognition system needs to extract the information representing the identity in the spectrogram.Therefore,the system needs to be improved on the basis of the original spectrogram.In the spectrogram,the information such as pitch frequency and formant can best represent the identity characteristics of the speaker.Therefore,a self⁃adaptive comb filtering is carried out according to the pitch frequency of each frame in the speech signal to obtain the improved spectrogram,and then the features of the speaker are extracted by the convolutional neural network,so as to improve the recognition accuracy rate.The MobileNetv2 neural network is adopted in the network model.This network model has the advantages of fewer model parameters,fast convergence speed and fast recognition speed,which is conducive to practical application.In the results of control experiments,the accuracy rate of this method is improved by 2.3%,5.2%and 3%,respectively,in comparison with that of the original spectrogram.

关 键 词:语谱图 基音频率 梳状滤波器 深度学习 说话人识别 深度可分离卷积 

分 类 号:TN912.34-34[电子电信—通信与信息系统] TP183[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象