Auditory attention model based on Chirplet for cross-corpus speech emotion recognition  被引量:1

用于跨库语音情感识别的时频原子听觉注意模型(英文)

在线阅读下载全文

作  者:张昕然[1] 宋鹏[2] 查诚[1] 陶华伟[1] 赵力[1] 

机构地区:[1]东南大学水声信号处理教育部重点实验室,南京210096 [2]烟台大学计算机与控制工程学院,烟台264005

出  处:《Journal of Southeast University(English Edition)》2016年第4期402-407,共6页东南大学学报(英文版)

基  金:The National Natural Science Foundation of China(No.61273266,61231002,61301219,61375028);the Specialized Research Fund for the Doctoral Program of Higher Education(No.20110092130004);the Natural Science Foundation of Shandong Province(No.ZR2014FQ016)

摘  要:To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for feature extraction.First, in order to extract the spectra features, the auditory attention model is employed for variational emotion features detection. Then, the selective attention mechanism model is proposed to extract the salient gist features which showtheir relation to the expected performance in cross-corpus testing.Furthermore, the Chirplet time-frequency atoms are introduced to the model. By forming a complete atom database, the Chirplet can improve the spectrum feature extraction including the amount of information. Samples from multiple databases have the characteristics of multiple components. Hereby, the Chirplet expands the scale of the feature vector in the timefrequency domain. Experimental results show that, compared to the traditional feature model, the proposed feature extraction approach with the prototypical classifier has significant improvement in cross-corpus speech recognition. In addition, the proposed method has better robustness to the inconsistent sources of the training set and the testing set.为解决跨数据库语音情感识别领域中实验数据集特征不匹配的问题,提出一种基于时频原子的听觉注意特征提取模型.首先,为了提取频谱特征,引入听觉注意模型对多类情感特征进行有效的探测.然后,利用选择注意机制改进了提取的语谱图特征,其中包含的显著性信息与跨库识别性能有紧密联系.再引入Chirplet时频原子,通过形成的过完备原子库提高语谱图特征的信息量.来自多个数据库的样本具有多成分分布的特征,据此所提模型中的Chirplet扩大了特征向量在时频域上的尺度.实验结果显示,相比传统特征模型,所提方法性能有显著提升.此外,该方法在训练集和测试集来源不一致情况下具有更好的鲁棒性.

关 键 词:speech emotion recognition selective attention mechanism spectrogram feature cross-corpus 

分 类 号:TN912.34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象