自编码器和LSTM在混合语音情感的应用  被引量:1

Application of Autoencoder and LSTM in Mixed Speech Emotion

在线阅读下载全文

作  者:张卫 贾宇 张雪英 ZHANG Wei;JIA Yu;ZHANG Xue-Ying(College of Information,Shanxi University of Finance and Economics,Taiyuan Shanxi 030006,China;College of Information and Computer,Taiyuan University of Technology,Taiyuan Shanxi 030024,China)

机构地区:[1]山西财经大学信息学院,山西太原030006 [2]太原理工大学信息与计算机学院,山西太原030024

出  处:《计算机仿真》2022年第11期258-262,共5页Computer Simulation

基  金:国家青年科学基金项目(61902226);山西省青年科技研究基金(201901D211415);山西省高等学校科技创新项目(2019L0498);山西财经大学青年科研基金项目(QN-2019017)。

摘  要:针对混合语音情感识别中,传统识别方法不能充分考虑语种之间的差异性,导致分类准确率偏低的问题,提出了自编码器(autoencoder)与长短时记忆(Long Short Term Memory,LSTM)模型相结合的方法,通过提取MFCC,MEL Spectrogram Frequency,Chroma三种特征获得180维特征。并利用自编码器获取一个更高维度、更深层次的500维特征,通过LSTM进行建模,提高语音情感分类的准确性。使用德语EMO-DB和中文CASIA语音库进行分类实验,研究表明,自编码器提取出的深度特征更适合混合语音情感分类。较传统分类方法,使用自编码器+LSTM进行分类,最优识别结果可提升7.5%。In mixed speech emotion recognition,traditional recognition methods can not fully consider the differences between languages,which leads to low classification accuracy.A method combining auto encoder with Long Short Term Memory(LSTM)model is proposed.This method obtains 180 dimensional features by extracting MFCC,MEL Spectrum Frequency and Chroma features.In addition,the method uses autoencoder to obtain a higher dimension and deeper level 500-dimension features,as well as to improve the accuracy of speech emotion classification by modeling through the LSTM.The classification experiments were carried out on German EMO-DB and Chinese CASIA database.The result shows that,the depth features extracted from the autoencoder is more suitable for speech emotion classification.Compared with the traditional classification method,the optimal recognition result can be increased by 7.5%by using Autoencoder-LSTM.

关 键 词:自编码器 长短时记忆 混合语音情感识别 

分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象