基于异构并行神经网络的语音情感识别被引量：9

Speech Emotion Recognition Based on Heterogeneous Parallel Neural Network

作　　者：张会云黄鹤鸣[1,2] ZHANG Huiyun;HUANG Heming(Computer College,Qinghai Normal University,Xining 810008,China;State Key Laboratory of Tibetan Intelligent Information Processing and Application,Xining 810008,China)

机构地区：[1]青海师范大学计算机学院,西宁810008 [2]藏语智能信息处理及应用国家重点实验室,西宁810008

出　　处：《计算机工程》2022年第4期113-118,共6页Computer Engineering

基　　金：国家自然科学基金(62066039)。

摘　　要：提取能表征语音情感的特征并构建具有较强鲁棒性和泛化性的声学模型是语音情感识别系统的核心。面向语音情感识别构建基于注意力机制的异构并行卷积神经网络模型AHPCL,采用长短时记忆网络提取语音情感的时间序列特征,使用卷积操作提取语音空间谱特征,通过将时间信息和空间信息相结合共同表征语音情感,提高预测结果的准确率。利用注意力机制,根据不同时间序列特征对语音情感的贡献程度分配权重,实现从大量特征信息中选择出更能表征语音情感的时间序列。在CASIA、EMODB、SAVEE等3个语音情感数据库上提取音高、过零率、梅尔频率倒谱系数等低级描述符特征,并计算这些低级描述符特征的高级统计函数共得到219维的特征作为输入进行实验验证。结果表明,AHPCL模型在3个语音情感数据库上分别取得了86.02%、84.03%、64.06%的未加权平均召回率,相比LeNet、DNN-ELM和TSFFCNN基线模型具有更强的鲁棒性和泛化性。The core of a Speech Emotion Recognition(SER)system is to extract features that can best represent speech emotion and construct an acoustic model with strong robustness and generalization.In this study,a heterogeneous parallel Recurrent Neural Network(RNN)model based on the attention mechanism AHPCL is constructed for SER.The Long Short-Term Memory(LSTM)network is used to extract the time-series features of speech emotion,and the convolution operation is used to extract the speech spatial spectral features.By combining temporal and spatial information to jointly represent speech emotion,the accuracy of the prediction results is improved.The attention mechanism is used to assign weights according to the contribution of different time-series features to speech emotion to select a time sequence that better represents speech emotion from a large amount of feature information.Low-level descriptor features such as pitch,Zero Crossing Rate(ZCR),and Mel-Frequency Cepstrum Coefficient(MFCC)are extracted from three speech emotion databases,namely CASIA,EMODB,and SAVEE,and the high-level statistical functions of these low-level descriptor features are calculated to obtain 219 dimensional features.The experimental results show that the proposed model achieves 86.02%,84.03%,and 64.06%Unweighted Average Recall(UAR)on the CASIA,EMODB,and SAVEE databases,respectively.Compared with the LeNet,DNN-ELM,and TSFFCNN baseline models,the AHPCL model exhibits greater robustness and generalization.

关键词：语音情感识别谱特征韵律特征注意力机制异构并行分支循环神经网络

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于异构并行神经网络的语音情感识别被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于异构并行神经网络的语音情感识别 被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于异构并行神经网络的语音情感识别被引量：9