检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:武星[1] 殷浩宇 姚骏峰 李卫民[1] 钱权[1] WU Xing;YIN Haoyu;YAO Junfeng;LI Weimin;QIAN Quan(School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China;CSSC Seago System Technology Co.,Ltd.,Shanghai 200010,China)
机构地区:[1]上海大学计算机工程与科学学院,上海200444 [2]中国船舶集团海舟系统技术有限公司,上海200010
出 处:《计算机工程》2024年第6期218-227,共10页Computer Engineering
基 金:国家自然科学基金重点项目(61936001);上海市启明星项目(21QB1401900)。
摘 要:多模态情感分析旨在从文本、图像和音频数据中提取和整合语义信息,从而识别在线视频中说话者的情感状态。尽管多模态融合方案在此研究领域已取得一定成果,但是已有方法在处理模态间分布差异和关系知识的融合方面仍有欠缺,为此,提出一种多模态情感分析方法。设计一种多模态提示门(MPG)模块,其能够将非语言信息转换为融合文本上下文的提示,利用文本信息对非语言信号的噪声进行过滤,得到包含丰富语义信息的提示,以增强模态间的信息整合。此外,提出一种实例到标签的对比学习框架,在语义层面上区分隐空间中的不同标签以进一步优化模型输出。在3个大规模情感分析数据集上的实验结果表明,该方法的二分类精度相对次优模型提高了约0.7%,三分类精度提高了超过2.5%,达到0.671。该方法能够为将多模态情感分析引入用户画像、视频理解、AI面试等领域提供参考。Multimodal sentiment analysis aims to extract and integrate semantic information from text,images,and audio data in order to identify the emotional states of speakers in online videos.Although,multimodal fusion methods have shown definite outcomes in this research area,previous studies have not adequately addressed the distribution differences between modes and the fusion of relational knowledge.Therefore,a multimodal sentiment analysis method is recommended.In this context,this study proposes the design of a Multimodal Prompt Gate(MPG)module.The proposed module can convert nonverbal information into prompts that fuse the context,filter the noise of nonverbal signals using text information,and obtain prompts containing rich semantic information to enhance information integration between the modalities.In addition,a contrastive learning framework from instance to label is proposed.This framework is used to distinguish the different labels in latent space at the semantic level to further optimize the model output.Experiments on three large-scale sentiment analysis datasets are conducted.The results show that the binary classification accuracy of the proposed method improves by approximately 0.7%compared to the suboptimal model,and the ternary classification accuracy improves by more than 2.5%,reaching 0.671.This method can provide a reference for introducing multimodal sentiment analysis in the fields of user profiling,video understanding,and AI interviews.
关 键 词:多模态情感分析 语义信息 多模态融合 上下文表征 对比学习
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.141.202.216