基于跨模态融合的短视频情绪识别方法  

ICVNet:A Method on Cross-Modal Fusion of Short Video Emotion Recognition

在线阅读下载全文

作  者:薛均晓 武雪程 张牵 田萌萌[1] 翟蓝航 石磊[1] XUE Jun-xiao;WU Xue-cheng;ZHANG Qian;TIAN Meng-meng;ZHAI Lan-hang;SHI Lei(School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou 450002,China)

机构地区:[1]郑州大学网络空间安全学院,郑州450002

出  处:《人类工效学》2022年第5期49-55,共7页Chinese Journal of Ergonomics

基  金:国家自然科学基金资助项目(62006210);河南省高等学校青年骨干教师培养计划(22020GGJS014)。

摘  要:目的探讨人机交互领域中解决相似情绪难以区分的新型识别方法。方法采用将音频、视频以及光流三种不同模态融合的方式建立一种跨模态融合的短视频情绪识别方法ICVNet。结果(1)基于IEMOCAP建立了一个多模态情绪识别数据集;(2)分别提取音频、视频以及光流三种模态的特征信息并加载三种模态的预训练权重来进行决策级的特征融合;(3)构建情绪识别融合分类模块;(4)实验结果表明,ICVNet的情绪识别分类准确率达到80.77%。结论本文建立的跨模态融合的短视频情绪识别方法ICVNet可以有效地提升人机交互场景下的情绪识别准确率。Objective This study explored a new recognition method to solve the indistinguishability of similar emotions in the field of human-computer interaction.Methods This paper deploys the feature fusion of three different modalities of audio,video,and optical flow to establish a method on cross-modal fusion of short video emotion recognition,denoted ICVNet.Results(1)This paper build a multimodal emotion recognition dataset based on the IEMOCAP benchmark;(2)ICVNet extracts the feature information of the three modalities of audio,video and optical flow,respectively,and then utilizes the pre-trained weights of three modalities for decision-level feature fusion;(3)ICVNet constructs a specific fusion classification module;(4)The experimental results show that the accuracy of emotion recognition of ICVNet is 80.77%.Conclusion The cross-modal fusion method of short video emotion recognition established in this paper can effectively improve the accuracy of emotion recognition in human-computer interaction scenarios.

关 键 词:信息融合 视频 人机交互 跨模态深度学习 情绪识别 协调注意力机制 神经网络 图像 

分 类 号:B841.5[哲学宗教—基础心理学] B842.2[哲学宗教—心理学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象