基于双流残差融合的多模态讽刺解释研究  

Multimodal Sarcasm Explanation Survey Based on Dual-stream Residual Fusion

在线阅读下载全文

作  者:吴运兵[1,2] 曾炜森 高航 阴爱英 廖祥文 WU Yunbing;ZENG Weisen;GAO Hang;YIN Aiying;LIAO Xiangwen(College of Computer and Big Data,Fuzhou University,Fuzhou 350108,China;Digital Fujian Institute of Financial Big Data,Fuzhou 350108,China;Department of Computer Engineering,Zhicheng College of Fuzhou University,Fuzhou 350002,China)

机构地区:[1]福州大学计算机与大数据学院,福州350108 [2]数字福建金融大数据研究所,福州350108 [3]福州大学至诚学院计算机工程系,福州350002

出  处:《小型微型计算机系统》2024年第11期2628-2635,共8页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61976054)资助;福建省自然科学基金面上项目(2022J01116)资助.

摘  要:针对现有多模态讽刺解释模型在融合过程中仅关注图像中的细粒度特征信息,使得模型存在解释效果不佳、多模态特征难以融合等问题,本文设计了一种基于双流残差注意力的多模态融合机制.首先,本文采用了BART和VGG19模型分别提取文本和图像两种模态特征.其次,模型经过两路多头注意力引导,分别关注图像和文本的细粒度信息,考虑到单纯的多头自注意力不能很好学习图文间的关联信息,采用二次注意力模块(AOA)合理分配特征权重.最后,本文将多模态特征拼接融合后输入BART解码器中进行讽刺解释.模型在公开的数据集MORE上的实验结果表明,相较于ExMore模型,本文模型在METEOR和ROUGE-L评价指标上分别提升了4.35%、3.39%.实验结果表明本文模型能更好融合模态特征,从而显著地提升模型解释的效果.Existing methods on multimodal sarcasm explanation focus only on unidirectional fusion,i.e.,image to text,resulting in insufficient multimodal feature integration.To address this issue,we propose to incorporate a bidirectional residual attention mechanism for multimodal integration.We first extract text and image features using BART and VGG19 models,respectively.Then,we use two multi-headed attention layers,including image2text and text2image layers,to focus on fine-grained information from both modalities.Additionally,considering that simple multi-headed attention cannot capture inter-modal correlations well,we employ an AOA(Attentive Operation Attention)module to reasonably allocate feature weights.Finally,the fused multimodal embeddings are fed into the BART decoder to perform feature integration and multimodal sarcasm interpretation.We conducted experiments on the public MORE dataset.Results show that compared to the ExMore model,our model achieves improvements of 4.35%in Meteor and 3.39%in ROUGE-L.Experimental results show that the model proposed in this paper can better integrate modal features,thereby significantly improving the effect of model interpretation.

关 键 词:自然语言处理 深度学习 讽刺解释 多模态 注意力机制 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象