基于SE注意力机制和深度卷积的语音情感识别  

Speech emotion recognition based on SE attention mechanism and deep convolution

在线阅读下载全文

作  者:张少华 冯炎 余仁杰 邢沛然 任艺昊 ZHANG Shaohua;FENG Yan;YU Renjie;XING Peiran;REN Yihao(College of Information Science and Technology,Tibet University,Lhasa 850000,China)

机构地区:[1]西藏大学信息科学技术学院,西藏拉萨850000

出  处:《现代电子技术》2024年第22期64-70,共7页Modern Electronics Technique

摘  要:针对语音情感识别无法全面提取语音中的情感特征,导致识别准确率低的问题,提出一种基于SE注意力机制和深度卷积的双通道网络模型。首先利用速度增强对原始数据集进行数据扩增,选取Mel谱图、一阶差分、二阶差分混合特征图作为输入,以获得更全面的语音信号特征;然后在SE注意力机制通道前后添加Ghost卷积提取局部特征,在深度卷积通道前后引入卷积层和逐点卷积提取全局特征,通过特征融合层融合特征;最后利用指数型下降进行训练识别。结果表明,所提模型在扩增后的中文数据集CASIA、英文数据集SAVEE、eNTERFACE05中的准确率均高于其他深度卷积神经网络模型,验证了该模型的有效性及泛化能力。To solve the problem of low recognition accuracy due to the inability to fully extract emotion features in speech emotion recognition,a dual channel network model based on SE attention mechanism and deep convolution is proposed.The speed enhancement is used to augment the original dataset,and the Mel spectrogram,first-order differential,and second-order differential mixed feature maps are selected as inputs to obtain more comprehensive speech signal features.The Ghost convolution is added to extract local features before and after the SE attention mechanism channel,convolutional layers and point by point convolution are introduced to extract global features before and after the deep convolution channel,and then features are fused by means of feature fusion layer.The rain recognition is conducted by means of exponential descent.The results show that the accuracy of the proposed model in the expanded Chinese dataset CASIA,English dataset SAVEE,and eNTERFACE05 is higher than other deep convolutional neural network models,respectively,verifying the effectiveness and generalization ability of the model.

关 键 词:语音情感识别 双通道 SE注意力机制 数据扩增 Ghost卷积 深度卷积 逐点卷积 特征融合 

分 类 号:TN912-34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象