基于时空特征的语音情感识别模型TSTNet  被引量:4

Speech Emotion Recognition TSTNet Based on Spatial-temporal Features

在线阅读下载全文

作  者:薛均晓[1,2] 黄世博 王亚博 张朝阳 石磊[1,2] XUE Junxiao;HUANG Shibo;WANG Yabo;ZHANG Chaoyang;SHI Lei(School of Software, Zhengzhou University, Zhengzhou 450002, China;School of Cyberspace Security, Zhengzhou University, Zhengzhou 450002, China;School of Information Engineering, Zhengzhou University, Zhengzhou 450002, China)

机构地区:[1]郑州大学软件学院,河南郑州450002 [2]郑州大学网络空间安全学院,河南郑州450002 [3]郑州大学信息工程学院,河南郑州450001

出  处:《郑州大学学报(工学版)》2021年第6期28-33,共6页Journal of Zhengzhou University(Engineering Science)

基  金:河南省高等学校青年骨干教师培养计划(22020GGJS014)。

摘  要:针对社交语音由于语气、音调、语速等差异以及填充信息丢失或冗余等问题,提出一种基于时空特征的语音情感识别方法。该方法利用卷积神经网络(CNN)和双向循环神经网络(BiGRU)技术,包含空间特征提取、时间特征提取和特征融合3个模块。考虑到音频数据内容长短不一,首先对音频数据进行预处理,应用3种补零填充方法,得到不同尺度的语谱图。设计了空间特征提取方法捕获音频的局部特征,并利用时间特征提取方法获取音频数据的时间特征和前后语义关系,从而得到3个时空特征向量。此外,融合了时空特征向量并通过全连接层进行语音情感分类。利用科大讯飞语音情感数据集进行了数值实验,实验结果与传统语音情感识别模型的实验结果相比,在准确率、精确率、召回率和F1值等4项指标上均取得了较好结果。For differences in tone,pitch,speaking speed,etc.of social speech and information loss or redundancy during filling,a speech emotional recognition method was proposed based on spatial-temporal features.The method applied convolutional neural network(CNN)and bilateral recurrent neural network(BiGRU),including spatial feature extraction module,temporal feature extraction module and feature fusion module.Considering the different lengths of audio data content,the audio data was preprocessed first,and three zero-padded padding lengths were applied to obtain spectrograms of different scales.Then the spatial feature extraction module was designed to capture the local feature of the audio,and used the temporal feature extraction module to obtain the temporal feature and the semantic relationship of the audio data,thus obtained three spatial-temporal feature vectors.In addition,these temporal feature vectors were fused and input full connection layer for classification of speech emotion.With the numerical experiment using IFLYTEK speech emotion data sets,the experiment achieved better results in the accuracy,precision,recall,and F1 value than those of the experiment of traditional speech emotion recognition model.

关 键 词:语音情感识别 语谱图 时空特征 

分 类 号:TP39[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象