Self-attention transfer networks for speech emotion recognition  被引量:4

在线阅读下载全文

作  者:Ziping ZHAO Keru Wang Zhongtian BAO Zixing ZHANG Nicholas CUMMINS Shihuang SUN Haishuai WANG Jianhua TAO Björn WSCHULLER 

机构地区:[1]College of Computer and Information Engineering,Tianjin Normal University,Tianjin 300387,China [2]GLAM-Group on Language,Audio&Music,Imperial College London,SW72AZ,UK [3]Chair of Embedded Intelligence for Health Care and Wellbeing,University of Augsburg,86159,Germany [4]Department of Biostatistics and Health Informatics,IoPPN,King's College London,London,SE58AF,UK [5]Department of Computer Science and Engineering,Fairfield University 06824,USA [6]National Laboratory of Pattern Recognition,CASIA,Beijing 100190,China

出  处:《Virtual Reality & Intelligent Hardware》2021年第1期43-54,共12页虚拟现实与智能硬件(中英文)

基  金:the National Natural Science Foundation of China(62071330);the National Science Fund for Distinguished Young Scholars(61425017);the Key Program of the National Natural Science Foundation(61831022);the Key Program of the Natural Science Foundation of Tianjin(18JCZDJC36300);the Open Projects Program of the National Laboratory of Pattern Recognition and the Senior Visiting Scholar Program of Tianjin Normal University;the Innovative Medicines Initiative 2 Joint Undertaking(115902),which receives support from the European Union's Horizon 2020 research and innovation program and EFPIA.

摘  要:Background A crucial element of human-machine interaction,the automatic detection of emotional states from human speech has long been regarded as a challenging task for machine learning models.One vital challenge in speech emotion recognition(SER)is learning robust and discriminative representations from speech.Although machine learning methods have been widely applied in SER research,the inadequate amount of available annotated data has become a bottleneck impeding the extended application of such techniques(e.g.,deep neural networks).To address this issue,we present a deep learning method that combines knowledge transfer and self-attention for SER tasks.Herein,we apply the log-Mel spectrogram with deltas and delta-deltas as inputs.Moreover,given that emotions are time dependent,we apply temporal convolutional neural networks to model the variations in emotions.We further introduce an attention transfer mechanism,which is based on a self-attention algorithm to learn long-term dependencies.The self-attention transfer network(SATN)in our proposed approach takes advantage of attention transfer to learn attention from speech recognition,followed by transferring this knowledge into SER.An evaluation built on Interactive Emotional Dyadic Motion Capture(IEMOCAP)dataset demonstrates the effectiveness of the proposed model.

关 键 词:Speech emotion recognition Attention transfer Self-attention Temporal convolutional neural networks(TCNs) 

分 类 号:TN912.34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象