融合双路CNN-LSTM与注意力机制的语音情感识别模型

Speech emotion recognition model combining two⁃channel CNN⁃LSTM and attention mechanism

作　　者：申雁李鸿燕[1] 蒙志宏张丽彩 SHEN Yan;LI Hongyan;MENG Zhihong;ZHANG Licai(College of Electronic Information and Optical Engineering,Taiyuan University of Technology,Jinzhong 030600,China)

机构地区：[1]太原理工大学电子信息与光学工程学院,山西晋中030600

出　　处：《电子设计工程》2024年第18期6-11,共6页Electronic Design Engineering

基　　金：国家自然科学基金项目(62201377);山西省回国留学人员科研资助项目(2022-072)。

摘　　要：针对现有以卷积神经网络为基础的语音情感识别方法存在特征提取不足、模型识别效果不佳等问题,提出融合双路CNN-LSTM与注意力机制的语音情感识别模型。模型采用双路多维多尺度特征提取方法,结合残差块、多尺度卷积提取MFCC、Chroma和语谱图深层特征,增加特征多样性;采用注意力机制,分别计算双路特征的自注意力与交叉注意力参数,分配不同权重系数并进行加权融合,综合互补信息,减少特征冗余影响;采用LSTM网络提取时序特征,获取上下文语义信息,采用Softmax函数在数据集RAVDESS与SEWA上的分类准确率分别为90.19%和89.23%。Aiming at the problems in existing speech emotion recognition methods based on convolutional neural networks,such as insufficient feature extraction and poor model recognition,a speech emotion recognition model combining two-channel CNN-LSTM and attention mechanism is proposed.In order to increase the feature diversity,a two-path multi-dimensional multi-scale feature extraction method is proposed,the method combined residual blocks and multi-scale convolution to extract MFCC,Chroma and spectrogram features.By calculating self-attention parametersand cross-attention parameters of the two-path features,the attention mechanism assigned different weight coefficients of the two-path features and weighted the two-path features.LSTM network was used to extract temporal features and obtain contextual semantic information.Softmax was used to classify emotions,and the classification accuracy of RAVDESS and SEWA dataset was 90.19%and 89.23%.

关键词：情感识别注意力机制长短时记忆网络双路多维多尺度特征提取多尺度卷积

分类号：TN912.34[电子电信—通信与信息系统]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合双路CNN-LSTM与注意力机制的语音情感识别模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合双路CNN-LSTM与注意力机制的语音情感识别模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索