基于改进K均值聚类的语音情感识别深度学习方法  

DEEP LEARNING METHOD FOR SPEECH EMOTION RECOGNITION BASED ON IMPROVED K-MEAN CLUSTERING

在线阅读下载全文

作  者:李巧君[1] 郭彍[2] Li Qiaojun;Guo Guo(School of Electronic Information Engineering,Henan Polytechnic Institute,Nanyang 473000,Henan,China;College of Electronic Science and Engineering,University of Electronic Science and Technology of China,Chengdu 610054,Sichuan,China)

机构地区:[1]河南工业职业技术学院电子信息工程学院,河南南阳473000 [2]电子科技大学电子科学与工程学院,四川成都610054

出  处:《计算机应用与软件》2024年第9期224-229,共6页Computer Applications and Software

基  金:河南省高等学校重点科研项目(19A520022);河南省高等职业学校青年骨干教师培养计划项目(教职成函[2019]326号)。

摘  要:针对当前语音情感识别(Speech Emotion Recognition, SER)方法中准确性低和时间复杂度高的问题,提出一种基于改进K均值聚类的语音情感识别深度学习方法。采用改进的K-均值聚类算法从整个音频信号中选取反映情感特征的关键片段;使用短时傅里叶变换将所选序列转化为一个谱图;利用深度残差模型ResNet和深度双向长短时记忆Bi-LSTM网络从空间和时间上学习表征谱图中与情感相关的隐藏特征,基于Softmax分类器获得最终的情感分类。实验结果表明,所提方法比其他识别方法具有明显的优势,在改善情感识别率的同时,降低了模型的处理时间。Aimed at the problems of low accuracy and high time complexity in current speech emotion recognition(SRE)methods,a deep learning method for speech emotion recognition based on the improved k-mean clustering is proposed.The improved k-mean clustering algorithm was used to select the key segments which reflected the emotional features from the whole audio signal.The selected sequence was transformed into a spectrum by using short-time Fourier transform.The deep residual model ResNet and deep Bi-LSTM network were used to learn the hidden features related to emotion in the representation spectrum from space and time.The final sentiment classification was obtained based on Softmax classifier.Experimental results show that the proposed method has obvious advantages over other recognition methods,which improves the emotion recognition rate and reduces the processing time of the model.

关 键 词:语音情感识别 深度双向长短时记忆 K-均值聚类 短时傅里叶变换 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象