检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘泽昊 董胡[2] 赵新民 钱盛友 LIU Ze-hao;DONG Hu;ZHAO Xin-min;QIAN Sheng-you(School of Physics and Electronics,Hunan Normal University,Changsha 410081,China;Changsha Normal University,Changsha 410100,China)
机构地区:[1]湖南师范大学物理与电子科学学院,湖南长沙410081 [2]长沙师范学院,湖南长沙410100
出 处:《电脑与信息技术》2024年第6期38-42,共5页Computer and Information Technology
基 金:教育部人文社会科学研究青年基金项目(项目编号:22YJCZH025);湖南省教育科学“十四五”规划课题(项目编号:XJK23BXX003)。
摘 要:针对人机交互中情感识别的精度不高以及无法充分利用不同模态特征的问题,提出了一种基于深度学习融合音频和文本两种特征的语音情感识别方法。将语音和文本两种模态的情感识别模块在特征级别进行融合得到STEER模型。在公开数据集IEMOCAP上的实验结果表明,SPEECH模块采用HuBERT提取特征较语谱图法可提升情感识别率7.1%;TEXT模块所采用的BERT相较Word2Vec可提升情感识别率5.1%;SPEECH和TEXT模块进行不同策略融合后相较于两个独立的模块,情感识别精度均得到了明显提升,其中特征级别融合的STE-ER模型较最大置信度决策级融合的识别率提高了5.2%。Aiming at the problem of low accuracy of emotion recognition in human-computer interaction and the inability to make full use of different modal features,a speech emotion recognition method based on deep learning fusion of audio and text features was proposed.The STE-ER model was obtained by fusing the speech and text emotion recognition modules at the feature level.The experimental results on the public data set IEMOCAP showed that the SPEECH module used HuBERT to extract features,which could improve the emotional recognition rate by 7.1%compared with the spectrogram method;the BERT used in the TEXT module could improve the emotion recognition rate by 5.1%compared with Word2Vec.Compared with the two independent modules,the emotion recognition accuracy of SPEECH and TEXT modules was significantly improved after different strategies were fused.The recognition rate of STE-ER model with feature level fusion is 5.2%higher than that of maximum confidence decision level fusion.
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] TP391.1[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.209