面向数据增强的轻量化语音情感识别被引量：5

Lightweight Speech Emotion Recognition for Data Enhancement

作　　者：崔晨露崔琳[1,2] CUI Chen-lu;CUI Lin(School of Electronics and Information,Xi’an Polytechnic University,Xi’an 710048,China;School of Marine Science and Technology,Northwestern Polytechnical University,Xi’an 710072,China)

机构地区：[1]西安工程大学电子信息学院,陕西西安710048 [2]西北工业大学航海学院,陕西西安710072

出　　处：《计算机与现代化》2023年第4期83-89,100,共8页Computer and Modernization

基　　金：国家自然科学基金青年科学基金资助项目(61901347)。

摘　　要：利用深度学习进行语音情感识别时通常需要大量的训练数据。针对现有语音情感数据库匮乏且数据量少容易造成过拟合的缺陷,本文在预处理阶段,将原始语音通过加入高斯白噪声和对波形进行位移产生新的语音信号以实现数据增强,不仅可提高识别准确率而且可增强模型的鲁棒性。与此同时,由于普通卷积神经网络参数量过大,提出一种轻量化模型,该模型由可分离卷积与门控循环单元构成。首先,从原始语音中提取MFCC特征作为模型的输入;其次利用可分离卷积来提取语音空间信息,门控循环单元提取语音的时序信息,用时序信息和空间信息同时表征语音情感可以使预测结果更加准确;最后送入带有softmax的全连接层完成情感分类。实验结果表明,本文模型与基准模型相比不仅可以得到较高的准确率且模型可压缩约50%。The use of deep learning for speech emotion recognition requires a large amount of training data.In this paper,the original speech is enhanced by adding Gaussian white noise and shifting the waveform to generate new speech signals in the preprocessing stage,which not only improves the recognition accuracy but also enhances the robustness of the model,given the shortage of existing speech emotion databases and the defects of overfitting caused by the small amount of data.At the same time,due to the excessive amount of parameters of the common convolutional neural network,a lightweight model is proposed,which consists of separable convolutional and gated recurrent units.Firstly,MFCC features are extracted from the original speech as the input of the model,and secondly,separable convolutional is used to extract the spatial information of speech,and gated recurrent units extract the temporal information of speech so that the temporal and spatial information can be used to characterize the speech emotion at the same time.It can make the prediction results more accurate.Finally,a fully connected layer with softmax is fed to complete the sentiment classification.The experimental results show that the model in this paper can not only obtain higher accuracy but also compress the model by about 50% compared with the benchmark model.

关键词：语音情感识别数据增强高斯白噪声波形位移参数量

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向数据增强的轻量化语音情感识别被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向数据增强的轻量化语音情感识别 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

面向数据增强的轻量化语音情感识别被引量：5