检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:贾婧雯 蔡英 尔古打机[1] JIA Jing-wen;CAI Ying;ERGU Daji(College of Electronic and Information,Southwest Minzu University,Chengdu 610000,China)
机构地区:[1]西南民族大学电子信息学院,四川成都610000
出 处:《计算机工程与设计》2023年第3期922-928,共7页Computer Engineering and Design
基 金:西南民族大学研究生创新研究基金项目(CX2021SZ38)。
摘 要:为解决小样本中文语音情感识别准确度低的问题,提出一种基于残差网络改进的中文语音情感识别网络结构AResnet。使用时域增强和频域增强生成更复杂的模拟样本扩充语音情感数据,将注意力机制引入至残差网络(residual networks)中,关注谱图中情感特征分布,提升情感识别率。在CASIA中文语音数据集上训练、测试,其结果显示,对比DCNN+LSTM、Trumpt-6网络结构,识别率分别提升约14.9%、3%,验证了AResnet在中文语音情感识别中的有效性。该方法也在英语语音数据集eNTERFACE’05上进行实验,识别准确率为92%,验证了AResnet有较好的泛化能力。To solve the problem of low accuracy of Chinese speech emotion recognition with small sample,the improved Chinese speech emotion recognition network structure based on residual networks AResnet was presented.The speech emotion data were augmented with more complex simulated samples using time domain augmentation and frequency domain augmentation,and the attention mechanism was introduced into residual networks to focus on the distribution of emotion features in the spectrogram and improve the emotion recognition rate.The CASIA Chinese speech dataset was used for training and testing.Results show that compared with DCNN+LSTM and Trumpt-6 network structures,the emotion recognition rates of the proposed method increase by 14.9%and 3%respectively,which verifies the effectiveness of AResnet in Chinese speech emotion recognition.The method was also experimented on the English speech dataset eNTERFACE’05.Results show that the recognition accuracy is 92%.The proposed AResnet has good generalization ability.
关 键 词:语音情感识别 深度学习 残差网络 注意力机制 小样本 数据增强 语谱图
分 类 号:TN912.34[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.218.24.244