基于多头交叉注意力机制的视听情感识别

Audiovisual emotion recognition based on a multi-head cross attention mechanism

作　　者：王子琼赵德春[1] 秦璐陈毅沈宇辰 WANG Ziqiong;ZHAO Dechun;QIN Lu;CHEN Yi;SHEN Yuchen(School of Life Health Information Science and Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,P.R.China)

机构地区：[1]重庆邮电大学生命健康信息科学与工程学院,重庆400065

出　　处：《生物医学工程学杂志》2025年第1期24-31,共8页Journal of Biomedical Engineering

基　　金：重庆市自然科学基金(CSTB2024NSCQ-MSX0957)。

摘　　要：表征学习在视听情感识别中是一个备受关注的研究方向,其关键在于构建兼具一致性和差异性的有效情态表征,但如何精准实现情态表征仍面临诸多挑战,因此本文提出一种基于多头交叉注意力机制的跨模态视听情感识别模型。该模型通过多头交叉注意力架构实现融合特征和模态对齐,并采用分段训练策略以应对模态缺失问题。此外,为了保留每个模态的独立信息,本文设计了单模态辅助损失任务并使用了共享参数。最终,在多模态情绪数据集(CREMA-D)上,该模型的宏观和微观F1分数分别达到了84.5%和88.2%。研究结果表明,本文模型能有效地捕获视听模态内和模态间的特征表示,成功解决了单模态和多模态情绪识别框架的统一性问题,为视听情感识别提供了一种全新的解决思路。In audiovisual emotion recognition,representational learning is a research direction receiving considerable attention,and the key lies in constructing effective affective representations with both consistency and variability.However,there are still many challenges to accurately realize affective representations.For this reason,in this paper we proposed a cross-modal audiovisual recognition model based on a multi-head cross-attention mechanism.The model achieved fused feature and modality alignment through a multi-head cross-attention architecture,and adopted a segmented training strategy to cope with the modality missing problem.In addition,a unimodal auxiliary loss task was designed and shared parameters were used in order to preserve the independent information of each modality.Ultimately,the model achieved macro and micro F1 scores of 84.5%and 88.2%,respectively,on the crowdsourced annotated multimodal emotion dataset of actor performances(CREMA-D).The model in this paper can effectively capture intraand inter-modal feature representations of audio and video modalities,and successfully solves the unity problem of the unimodal and multimodal emotion recognition frameworks,which provides a brand-new solution to the audiovisual emotion recognition.

关键词：情感识别表征学习交叉注意力模态融合

分类号：TP391.41[自动化与计算机技术—计算机应用技术] TN912.3[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多头交叉注意力机制的视听情感识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多头交叉注意力机制的视听情感识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索