检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张少华 冯炎 余仁杰 邢沛然 任艺昊 ZHANG Shaohua;FENG Yan;YU Renjie;XING Peiran;REN Yihao(College of Information Science and Technology,Tibet University,Lhasa 850000,China)
机构地区:[1]西藏大学信息科学技术学院,西藏拉萨850000
出 处:《现代电子技术》2024年第22期64-70,共7页Modern Electronics Technique
摘 要:针对语音情感识别无法全面提取语音中的情感特征,导致识别准确率低的问题,提出一种基于SE注意力机制和深度卷积的双通道网络模型。首先利用速度增强对原始数据集进行数据扩增,选取Mel谱图、一阶差分、二阶差分混合特征图作为输入,以获得更全面的语音信号特征;然后在SE注意力机制通道前后添加Ghost卷积提取局部特征,在深度卷积通道前后引入卷积层和逐点卷积提取全局特征,通过特征融合层融合特征;最后利用指数型下降进行训练识别。结果表明,所提模型在扩增后的中文数据集CASIA、英文数据集SAVEE、eNTERFACE05中的准确率均高于其他深度卷积神经网络模型,验证了该模型的有效性及泛化能力。To solve the problem of low recognition accuracy due to the inability to fully extract emotion features in speech emotion recognition,a dual channel network model based on SE attention mechanism and deep convolution is proposed.The speed enhancement is used to augment the original dataset,and the Mel spectrogram,first-order differential,and second-order differential mixed feature maps are selected as inputs to obtain more comprehensive speech signal features.The Ghost convolution is added to extract local features before and after the SE attention mechanism channel,convolutional layers and point by point convolution are introduced to extract global features before and after the deep convolution channel,and then features are fused by means of feature fusion layer.The rain recognition is conducted by means of exponential descent.The results show that the accuracy of the proposed model in the expanded Chinese dataset CASIA,English dataset SAVEE,and eNTERFACE05 is higher than other deep convolutional neural network models,respectively,verifying the effectiveness and generalization ability of the model.
关 键 词:语音情感识别 双通道 SE注意力机制 数据扩增 Ghost卷积 深度卷积 逐点卷积 特征融合
分 类 号:TN912-34[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15