融合多特征的语音情感识别方法  被引量:10

Speech Emotion Recognition Method Fusion with Multi-feature

在线阅读下载全文

作  者:王怡[1] 王黎明[1] 柴玉梅[1] WANG Yi;WANG Li-ming;CHAI Yu-mei(School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China)

机构地区:[1]郑州大学信息工程学院,郑州450001

出  处:《小型微型计算机系统》2022年第6期1232-1239,共8页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(U1636111)资助.

摘  要:语音情感识别已经成为下一代人机交互技术的重要组成部分,从语音信号中提取与情感相关的特征是语音情感识别的重要挑战.针对单一特征在情感识别中准确度不高的问题,该文提出了特征级-决策级融合的方法融合声学特征和语义特征进行情感识别.首先提取声学特征,包括:1)低层次手工特征集,包括基于谱相关、音质、能量、基频等相关特征,以及基于低层次特征的高级统计特征;2)DNN提取的谱相关特征的深度特征;3)CNN提取的基于Filter_bank特征的深度特征.并且使用基于Listen-Attend-Spell(LAS)模型的语音识别模块提取语义特征.然后将声学特征中的3类特征与语义特征进行特征级融合,在确定融合特征的先后顺序时引入了构造哈夫曼树的方法.最后得到融合后特征和原始4类特征各自的情感识别结果,在结果之上进行决策级融合,使用此方法在IEMOCAP数据集中分类准确度可达76.2%.Speech emotion recognition has become a crucial part of the next generation of human-computer interaction technology.Extracting emotion-related features from speech signals is a challenge for speech emotion recognition.Aiming at the problem of low accuracy of single feature in emotion recognition,this paper proposes a feature-level-decision-level fusion method to combine acoustic and semantic features for emotion recognition.First extract the acoustic features,including:1)Low-level Descriptors,including related features based on spectral correlation,sound quality,energy,fundamental frequency,etc.,and High-Level Statistical Functions based on Low-level Descriptors;2)Depth features of spectral correlation features extracted by DNN;3)Depth features based on Filter_bank extracted by CNN.Besides,the speech recognition module based on Listen-Attend-Spell(LAS)model is used to extract semantic features.Then the three types of features in the acoustic features and the semantic features are feature-level fusion,and the method of constructing the Huffman tree is introduced when determining the sequence of the fusion features.Finally the emotion recognition results of the fused features and the original four types of features are obtained.The decision-level fusion is performed on the result,and the classification accuracy of the IEMOCAP data set using this method can reach 76.2%.

关 键 词:语音情感识别 声学特征 语义特征 特征级-决策级融合 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象