基于多任务学习与注意力机制的多层次音频特征情感识别研究被引量：1

Multilevel emotion recognition of audio features based on multitask learning and attention mechanism

作　　者：李磊[1,2] 朱永同杨琦[1,2,3] 赵金葳马柯 LI Lei;ZHU Yongtong;YANG Qi;ZHAO Jinwei;MA Ke(School of Health Science and Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;Institute of Machine Intelligence,University of Shanghai for Science and Technology,Shanghai 200093,China;School of Mechanical Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;School of Mechanical and Electrical Information,Shangqiu University,Shangqiu Henan 476000,China)

机构地区：[1]上海理工大学健康科学与工程学院,上海200093 [2]上海理工大学机器智能研究院,上海200093 [3]上海理工大学机械工程学院,上海200093 [4]商丘学院机械与电气信息学院,河南商丘476000

出　　处：《智能计算机与应用》2024年第1期85-94,101,共11页Intelligent Computer and Applications

摘　　要：传统音频分类任务仅仅是从单层次音频提取特征向量进行分类,即便使用过大的模型,其过多的参数也会造成特征之间的耦合,不符合特征提取“高聚类,低耦合”的原则。由于注意到一些与情绪相关的协变量并没有得到充分利用,本文在模型中加入性别先验知识;将多层次音频特征分类问题转化为多任务问题进行处理,从而对多层次特征进行解耦再进行分类;针对特征分布的再优化方面设计了一个中心损失模块。通过在IEMOCAP数据集上的实验结果表明,本文提出模型的加权精度(WA)和未加权精度(UA)分别达到了71.94%和73.37%,与原本的多层次模型相比,WA和UA分别提升了1.38%和2.35%。此外,还根据Nlinear和Dlinear算法设计了两个单层次音频特征提取器,在单层次音频特征分类实验中取得了较好的结果。The traditional audio classification tasks involve extracting feature vectors from single-level audio,which can result in coupling between features even when using large models due to excessive parameters,violating the principle of high cohesion and low coupling in feature extraction.We observe that some emotion-related covariates are not fully utilized.Therefore,we incorporate gender-prior knowledge into the model.Furthermore,we transform the multilevel audio feature classification problem into a multi-task problem,thereby decoupling and classifying multilevel features separately.Finally,we introduce center loss for further optimization of feature distribution.Experimental results on the IEMOCAP dataset demonstrate that the proposed model achieves a weighted accuracy(WA)of 71.94%and an unweighted accuracy(UA)of 73.37%,which are improved by 1.38%and 2.35%respectively compared to the original multilevel model.In addition,we have designed two single-level audio feature extractors based on the Nlinear and Dlinear algorithms,which have yielded promising results in single-level audio feature classification experiments.

关键词：语音情感分类 MFCC 中心损失多任务学习先验信息 Dlinear

分类号：TP241[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多任务学习与注意力机制的多层次音频特征情感识别研究被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多任务学习与注意力机制的多层次音频特征情感识别研究 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于多任务学习与注意力机制的多层次音频特征情感识别研究被引量：1