基于图像字幕的多模态对齐情感分析模型

Multimodal Alignment Sentiment Analysis Model Based on Image Captions

作　　者：穰雨辰马静[1] Rang Yuchen;Ma Jing(College of Economics and Management,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)

机构地区：[1]南京航空航天大学经济与管理学院,南京211106

出　　处：《数据分析与知识发现》2025年第1期100-109,共10页Data Analysis and Knowledge Discovery

基　　金：国家自然科学基金面上项目(项目编号:72174086)的研究成果之一。

摘　　要：【目的】为减小模态间差异,加强模态间的关联性,提出多模态对齐情感分析模型,精准把握多模态数据中蕴藏的情感倾向。【方法】针对文本模态,原始文本数据在补充图像字幕后使用RoBERTa预训练模型进行文本特征提取;针对图像模态,使用Clip Vision Model提取图像特征。将分别提取出的文本、图像特征通过以多模态Transformer为主的多模态对齐层,得到增强的融合特征,最后将多模态融合特征输入多层感知机进行情感识别分类。【结果】本文模型在MVSA-Multiple数据集上的准确率和F1值达到71.78%和68.97%,较基线模型中的最优表现,分别提高1.78和0.07个百分点。【局限】未使用更多的数据集检验模型表现。【结论】本文模型能有效促进模态间的融合,获得了更好的融合表征,提升了情感分析效果。[Objective]To reduce inter-modal differences and strengthen the correlation between modalities,this paper proposes a multimodal alignment sentiment analysis model to accurately capture the sentiment tendencies embedded in multimodal data.[Methods]For the textual modality,the original text data,supplemented with image captions,is processed using the RoBERTa pre-trained model for text feature extraction.We used the Clip Vision Model to extract image features for the image modality.The text and image features are aligned through a multimodal alignment layer based on a Multimodal Transformer to obtain enhanced fused features.Finally,the fused multimodal features are inputted into a multilayer perception for sentiment recognition and classification.[Results]The proposed model achieved an accuracy of 71.78%and an F1 score of 68.97%on the MVSAMultiple dataset,representing improvements of 1.78%and 0.07%,respectively,over the best-performing baseline model.[Limitations]The model’s performance was not validated using additional datasets.[Conclusions]The proposed model effectively promotes inter-modal fusion,achieves better fusion representations,and enhances sentiment analysis.

关键词：多模态情感分析图像字幕模态融合

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于图像字幕的多模态对齐情感分析模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于图像字幕的多模态对齐情感分析模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索