基于深度学习融合音频与文本的双模态情感识别方法

Dual-modal Recognition Method of Emotion Combining Speech with Text Based on Deep Learning

作　　者：刘泽昊董胡[2] 赵新民钱盛友 LIU Ze-hao;DONG Hu;ZHAO Xin-min;QIAN Sheng-you(School of Physics and Electronics,Hunan Normal University,Changsha 410081,China;Changsha Normal University,Changsha 410100,China)

机构地区：[1]湖南师范大学物理与电子科学学院,湖南长沙410081 [2]长沙师范学院,湖南长沙410100

出　　处：《电脑与信息技术》2024年第6期38-42,共5页Computer and Information Technology

基　　金：教育部人文社会科学研究青年基金项目(项目编号:22YJCZH025);湖南省教育科学“十四五”规划课题(项目编号:XJK23BXX003)。

摘　　要：针对人机交互中情感识别的精度不高以及无法充分利用不同模态特征的问题,提出了一种基于深度学习融合音频和文本两种特征的语音情感识别方法。将语音和文本两种模态的情感识别模块在特征级别进行融合得到STEER模型。在公开数据集IEMOCAP上的实验结果表明,SPEECH模块采用HuBERT提取特征较语谱图法可提升情感识别率7.1%;TEXT模块所采用的BERT相较Word2Vec可提升情感识别率5.1%;SPEECH和TEXT模块进行不同策略融合后相较于两个独立的模块,情感识别精度均得到了明显提升,其中特征级别融合的STE-ER模型较最大置信度决策级融合的识别率提高了5.2%。Aiming at the problem of low accuracy of emotion recognition in human-computer interaction and the inability to make full use of different modal features,a speech emotion recognition method based on deep learning fusion of audio and text features was proposed.The STE-ER model was obtained by fusing the speech and text emotion recognition modules at the feature level.The experimental results on the public data set IEMOCAP showed that the SPEECH module used HuBERT to extract features,which could improve the emotional recognition rate by 7.1%compared with the spectrogram method;the BERT used in the TEXT module could improve the emotion recognition rate by 5.1%compared with Word2Vec.Compared with the two independent modules,the emotion recognition accuracy of SPEECH and TEXT modules was significantly improved after different strategies were fused.The recognition rate of STE-ER model with feature level fusion is 5.2%higher than that of maximum confidence decision level fusion.

关键词：情感识别语音文本特征级别融合深度学习

分类号：TP391.41[自动化与计算机技术—计算机应用技术] TP391.1[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度学习融合音频与文本的双模态情感识别方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度学习融合音频与文本的双模态情感识别方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索