检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张会云 黄鹤鸣[1,2] ZHANG Huiyun;HUANG Heming(Computer College,Qinghai Normal University,Xining 810008,China;State Key Laboratory of Tibetan Intelligent Information Processing and Application,Xining 810008,China)
机构地区:[1]青海师范大学计算机学院,西宁810008 [2]藏语智能信息处理及应用国家重点实验室,西宁810008
出 处:《计算机工程》2022年第4期113-118,共6页Computer Engineering
基 金:国家自然科学基金(62066039)。
摘 要:提取能表征语音情感的特征并构建具有较强鲁棒性和泛化性的声学模型是语音情感识别系统的核心。面向语音情感识别构建基于注意力机制的异构并行卷积神经网络模型AHPCL,采用长短时记忆网络提取语音情感的时间序列特征,使用卷积操作提取语音空间谱特征,通过将时间信息和空间信息相结合共同表征语音情感,提高预测结果的准确率。利用注意力机制,根据不同时间序列特征对语音情感的贡献程度分配权重,实现从大量特征信息中选择出更能表征语音情感的时间序列。在CASIA、EMODB、SAVEE等3个语音情感数据库上提取音高、过零率、梅尔频率倒谱系数等低级描述符特征,并计算这些低级描述符特征的高级统计函数共得到219维的特征作为输入进行实验验证。结果表明,AHPCL模型在3个语音情感数据库上分别取得了86.02%、84.03%、64.06%的未加权平均召回率,相比LeNet、DNN-ELM和TSFFCNN基线模型具有更强的鲁棒性和泛化性。The core of a Speech Emotion Recognition(SER)system is to extract features that can best represent speech emotion and construct an acoustic model with strong robustness and generalization.In this study,a heterogeneous parallel Recurrent Neural Network(RNN)model based on the attention mechanism AHPCL is constructed for SER.The Long Short-Term Memory(LSTM)network is used to extract the time-series features of speech emotion,and the convolution operation is used to extract the speech spatial spectral features.By combining temporal and spatial information to jointly represent speech emotion,the accuracy of the prediction results is improved.The attention mechanism is used to assign weights according to the contribution of different time-series features to speech emotion to select a time sequence that better represents speech emotion from a large amount of feature information.Low-level descriptor features such as pitch,Zero Crossing Rate(ZCR),and Mel-Frequency Cepstrum Coefficient(MFCC)are extracted from three speech emotion databases,namely CASIA,EMODB,and SAVEE,and the high-level statistical functions of these low-level descriptor features are calculated to obtain 219 dimensional features.The experimental results show that the proposed model achieves 86.02%,84.03%,and 64.06%Unweighted Average Recall(UAR)on the CASIA,EMODB,and SAVEE databases,respectively.Compared with the LeNet,DNN-ELM,and TSFFCNN baseline models,the AHPCL model exhibits greater robustness and generalization.
关 键 词:语音情感识别 谱特征 韵律特征 注意力机制 异构并行分支 循环神经网络
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.46