基于小波散射变换和MFCC的双特征语音情感识别融合算法  被引量:2

Dual-feature speech emotion recognition fusion algorithm based on wavelet scattering transform and MFCC

在线阅读下载全文

作  者:应娜[1] 吴顺朋 杨萌[1] 邹雨鉴 YING Na;WU Shunpeng;YANG Meng;ZOU Yujian(School of Communication Engineering,Hangzhou Dianzi University,Hangzhou 310018,China)

机构地区:[1]杭州电子科技大学通信工程学院,浙江杭州310018

出  处:《电信科学》2024年第5期62-72,共11页Telecommunications Science

基  金:浙江省自然科学基金资助项目(No.LTGS23F010001);浙江省属高校基本科研业务费专项资金资助项目(No.GK239909299001-406)。

摘  要:为了充分挖掘语音信号频谱包含的情感信息以提高语音情感识别的准确性,提出了一种基于小波散射变换和梅尔频率倒谱系数(Mel-frequency cepstral coefficient,MFCC)的排列熵加权和偏差调整规则的语音情感识别融合算法(PEW-BAR)。算法首先获取语音信号的小波散射特征和梅尔频率倒谱系数的相关特征;然后按尺度维度扩展小波散射特征,利用支持向量机得到情感识别的后验概率并获得排列熵,并使用排列熵对后验概率进行加权;最后采用一种偏差调整规则进一步融合MFCC的相关特征的识别结果。实验结果表明,在EMODB、RAVDESS和eNTERFACE05数据集上,与传统的基于小波散射系数的语音情感识别方法相比,该算法将ACC分别提高了2.82%、2.85%和5.92%,将UAR分别提升了3.40%、2.87%和5.80%,IEMOCAP上提高了6.89%。A fusion algorithm named permutation entropy weighted and bias adjustment rule fusion(PEW-BAR)was proposed to enhance the accuracy of speech emotion recognition by exploiting the emotional information in the spectral characteristics of speech signals.The algorithm was based on the integration of wavelet scattering transform and Mel-frequency cepstral coefficients(MFCC).Firstly,wavelet scattering features and MFCC-related features from speech signals were extracted.Then,the wavelet scattering features were expanded in the scale dimension and applied support vector machines to obtain posterior probabilities for emotion recognition.And permutation entropy was calculated and a weighted fusion based on this entropy was subsequently applied.Finally,a bias adjustment rule was utilized to refine the integration results obtained from the MFCC-related features.Experimental results on various datasets,including EMODB,RAVDESS,and eNTERFACE05,demonstrate notable improvements.The proposed algorithm outperforms traditional wavelet scattering coefficient-based methods,achieving accuracy improvements of 2.82%,2.85%,and 5.92%,respectively.Additionally,it shows enhancements of 3.40%,2.87%,and 5.80%in terms of unweighted average recall(UAR),and a 6.89%improvement on the IEMOCAP dataset.

关 键 词:语音情感识别 小波散射变换 排列熵 MFCC 模型融合 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象