基于残差网络改进的中文语音情感识别  被引量:4

Improved Chinese speech emotion recognition network based on residual network

在线阅读下载全文

作  者:贾婧雯 蔡英 尔古打机[1] JIA Jing-wen;CAI Ying;ERGU Daji(College of Electronic and Information,Southwest Minzu University,Chengdu 610000,China)

机构地区:[1]西南民族大学电子信息学院,四川成都610000

出  处:《计算机工程与设计》2023年第3期922-928,共7页Computer Engineering and Design

基  金:西南民族大学研究生创新研究基金项目(CX2021SZ38)。

摘  要:为解决小样本中文语音情感识别准确度低的问题,提出一种基于残差网络改进的中文语音情感识别网络结构AResnet。使用时域增强和频域增强生成更复杂的模拟样本扩充语音情感数据,将注意力机制引入至残差网络(residual networks)中,关注谱图中情感特征分布,提升情感识别率。在CASIA中文语音数据集上训练、测试,其结果显示,对比DCNN+LSTM、Trumpt-6网络结构,识别率分别提升约14.9%、3%,验证了AResnet在中文语音情感识别中的有效性。该方法也在英语语音数据集eNTERFACE’05上进行实验,识别准确率为92%,验证了AResnet有较好的泛化能力。To solve the problem of low accuracy of Chinese speech emotion recognition with small sample,the improved Chinese speech emotion recognition network structure based on residual networks AResnet was presented.The speech emotion data were augmented with more complex simulated samples using time domain augmentation and frequency domain augmentation,and the attention mechanism was introduced into residual networks to focus on the distribution of emotion features in the spectrogram and improve the emotion recognition rate.The CASIA Chinese speech dataset was used for training and testing.Results show that compared with DCNN+LSTM and Trumpt-6 network structures,the emotion recognition rates of the proposed method increase by 14.9%and 3%respectively,which verifies the effectiveness of AResnet in Chinese speech emotion recognition.The method was also experimented on the English speech dataset eNTERFACE’05.Results show that the recognition accuracy is 92%.The proposed AResnet has good generalization ability.

关 键 词:语音情感识别 深度学习 残差网络 注意力机制 小样本 数据增强 语谱图 

分 类 号:TN912.34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象