基于图像字幕的多模态对齐情感分析模型  

Multimodal Alignment Sentiment Analysis Model Based on Image Captions

作  者:穰雨辰 马静[1] Rang Yuchen;Ma Jing(College of Economics and Management,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)

机构地区:[1]南京航空航天大学经济与管理学院,南京211106

出  处:《数据分析与知识发现》2025年第1期100-109,共10页Data Analysis and Knowledge Discovery

基  金:国家自然科学基金面上项目(项目编号:72174086)的研究成果之一。

摘  要:【目的】为减小模态间差异,加强模态间的关联性,提出多模态对齐情感分析模型,精准把握多模态数据中蕴藏的情感倾向。【方法】针对文本模态,原始文本数据在补充图像字幕后使用RoBERTa预训练模型进行文本特征提取;针对图像模态,使用Clip Vision Model提取图像特征。将分别提取出的文本、图像特征通过以多模态Transformer为主的多模态对齐层,得到增强的融合特征,最后将多模态融合特征输入多层感知机进行情感识别分类。【结果】本文模型在MVSA-Multiple数据集上的准确率和F1值达到71.78%和68.97%,较基线模型中的最优表现,分别提高1.78和0.07个百分点。【局限】未使用更多的数据集检验模型表现。【结论】本文模型能有效促进模态间的融合,获得了更好的融合表征,提升了情感分析效果。[Objective]To reduce inter-modal differences and strengthen the correlation between modalities,this paper proposes a multimodal alignment sentiment analysis model to accurately capture the sentiment tendencies embedded in multimodal data.[Methods]For the textual modality,the original text data,supplemented with image captions,is processed using the RoBERTa pre-trained model for text feature extraction.We used the Clip Vision Model to extract image features for the image modality.The text and image features are aligned through a multimodal alignment layer based on a Multimodal Transformer to obtain enhanced fused features.Finally,the fused multimodal features are inputted into a multilayer perception for sentiment recognition and classification.[Results]The proposed model achieved an accuracy of 71.78%and an F1 score of 68.97%on the MVSAMultiple dataset,representing improvements of 1.78%and 0.07%,respectively,over the best-performing baseline model.[Limitations]The model’s performance was not validated using additional datasets.[Conclusions]The proposed model effectively promotes inter-modal fusion,achieves better fusion representations,and enhances sentiment analysis.

关 键 词:多模态 情感分析 图像字幕 模态融合 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象