面向视频数据的多模态情感分析

Multimodal Sentiment Analysis for Video Data

作　　者：武星[1] 殷浩宇姚骏峰李卫民[1] 钱权[1] WU Xing;YIN Haoyu;YAO Junfeng;LI Weimin;QIAN Quan(School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China;CSSC Seago System Technology Co.,Ltd.,Shanghai 200010,China)

机构地区：[1]上海大学计算机工程与科学学院,上海200444 [2]中国船舶集团海舟系统技术有限公司,上海200010

出　　处：《计算机工程》2024年第6期218-227,共10页Computer Engineering

基　　金：国家自然科学基金重点项目(61936001);上海市启明星项目(21QB1401900)。

摘　　要：多模态情感分析旨在从文本、图像和音频数据中提取和整合语义信息,从而识别在线视频中说话者的情感状态。尽管多模态融合方案在此研究领域已取得一定成果,但是已有方法在处理模态间分布差异和关系知识的融合方面仍有欠缺,为此,提出一种多模态情感分析方法。设计一种多模态提示门(MPG)模块,其能够将非语言信息转换为融合文本上下文的提示,利用文本信息对非语言信号的噪声进行过滤,得到包含丰富语义信息的提示,以增强模态间的信息整合。此外,提出一种实例到标签的对比学习框架,在语义层面上区分隐空间中的不同标签以进一步优化模型输出。在3个大规模情感分析数据集上的实验结果表明,该方法的二分类精度相对次优模型提高了约0.7%,三分类精度提高了超过2.5%,达到0.671。该方法能够为将多模态情感分析引入用户画像、视频理解、AI面试等领域提供参考。Multimodal sentiment analysis aims to extract and integrate semantic information from text,images,and audio data in order to identify the emotional states of speakers in online videos.Although,multimodal fusion methods have shown definite outcomes in this research area,previous studies have not adequately addressed the distribution differences between modes and the fusion of relational knowledge.Therefore,a multimodal sentiment analysis method is recommended.In this context,this study proposes the design of a Multimodal Prompt Gate(MPG)module.The proposed module can convert nonverbal information into prompts that fuse the context,filter the noise of nonverbal signals using text information,and obtain prompts containing rich semantic information to enhance information integration between the modalities.In addition,a contrastive learning framework from instance to label is proposed.This framework is used to distinguish the different labels in latent space at the semantic level to further optimize the model output.Experiments on three large-scale sentiment analysis datasets are conducted.The results show that the binary classification accuracy of the proposed method improves by approximately 0.7%compared to the suboptimal model,and the ternary classification accuracy improves by more than 2.5%,reaching 0.671.This method can provide a reference for introducing multimodal sentiment analysis in the fields of user profiling,video understanding,and AI interviews.

关键词：多模态情感分析语义信息多模态融合上下文表征对比学习

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向视频数据的多模态情感分析

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向视频数据的多模态情感分析

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索