检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:崔晨露 崔琳[1,2] CUI Chen-lu;CUI Lin(School of Electronics and Information,Xi’an Polytechnic University,Xi’an 710048,China;School of Marine Science and Technology,Northwestern Polytechnical University,Xi’an 710072,China)
机构地区:[1]西安工程大学电子信息学院,陕西西安710048 [2]西北工业大学航海学院,陕西西安710072
出 处:《计算机与现代化》2023年第4期83-89,100,共8页Computer and Modernization
基 金:国家自然科学基金青年科学基金资助项目(61901347)。
摘 要:利用深度学习进行语音情感识别时通常需要大量的训练数据。针对现有语音情感数据库匮乏且数据量少容易造成过拟合的缺陷,本文在预处理阶段,将原始语音通过加入高斯白噪声和对波形进行位移产生新的语音信号以实现数据增强,不仅可提高识别准确率而且可增强模型的鲁棒性。与此同时,由于普通卷积神经网络参数量过大,提出一种轻量化模型,该模型由可分离卷积与门控循环单元构成。首先,从原始语音中提取MFCC特征作为模型的输入;其次利用可分离卷积来提取语音空间信息,门控循环单元提取语音的时序信息,用时序信息和空间信息同时表征语音情感可以使预测结果更加准确;最后送入带有softmax的全连接层完成情感分类。实验结果表明,本文模型与基准模型相比不仅可以得到较高的准确率且模型可压缩约50%。The use of deep learning for speech emotion recognition requires a large amount of training data.In this paper,the original speech is enhanced by adding Gaussian white noise and shifting the waveform to generate new speech signals in the preprocessing stage,which not only improves the recognition accuracy but also enhances the robustness of the model,given the shortage of existing speech emotion databases and the defects of overfitting caused by the small amount of data.At the same time,due to the excessive amount of parameters of the common convolutional neural network,a lightweight model is proposed,which consists of separable convolutional and gated recurrent units.Firstly,MFCC features are extracted from the original speech as the input of the model,and secondly,separable convolutional is used to extract the spatial information of speech,and gated recurrent units extract the temporal information of speech so that the temporal and spatial information can be used to characterize the speech emotion at the same time.It can make the prediction results more accurate.Finally,a fully connected layer with softmax is fed to complete the sentiment classification.The experimental results show that the model in this paper can not only obtain higher accuracy but also compress the model by about 50% compared with the benchmark model.
关 键 词:语音情感识别 数据增强 高斯白噪声 波形位移 参数量
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.141.17