面向多模态情感分析的低秩跨模态Transformer  

A low-rank cross-modal Transformer for multimodal sentiment analysis

在线阅读下载全文

作  者:孙杰 车文刚[1] 高盛祥[1] SUN Jie;CHE Wen-gang;GAO Sheng-xiang(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,云南昆明650500

出  处:《计算机工程与科学》2024年第10期1888-1900,共13页Computer Engineering & Science

基  金:国家自然科学基金(61972186);云南省科技人才与平台计划(202105AC160018)。

摘  要:多模态情感分析将基于文本的方法扩展到包含视觉和语音信号的多模态环境,已成为情感计算领域的热门研究方向。在预训练-微调的背景下,将预训练语言模型微调到多模态情感分析领域是必要的。然而,微调大规模预训练语言模型仍然很昂贵,而且跨模态交互不足会影响性能。因此,提出低秩跨模态Transformer(LRCMT)来解决这些问题。受大型预训练语言模型在适应不同的自然语言处理下游任务时所呈现的低秩参数更新现象启发,LRCMT在每个冻结层中注入可训练的低秩参数矩阵,这大大减少了可训练参数,同时允许动态单词表示。此外,设计了跨模态交互模块,其中视觉和语音模态在与文本模态交互之前首先相互交互,从而实现更充分的跨模态融合。在多模态情感分析基准数据集上的大量实验表明了LRCMT的有效性和高效性。仅微调约全参数量0.76%的参数,LRCMT实现了与完全微调相当或更高的性能。此外,它还在许多指标上获得了最先进或具有竞争力的结果。消融实验表明,低秩微调与充分的跨模态交互有助于提升LRCMT的性能。总之,本文的工作降低了预训练语言模型在多模态任务上的微调成本,并为高效和有效的跨模态融合提供了思路。Multimodal sentiment analysis,which extends text-based affective computing to multimodal contexts with visual and speech modalities,is an emerging research area.In the pretrain-finetune paradigm,fine-tuning large pre-trained language models is necessary for good performance on multimodal sentiment analysis.However,fine-tuning large-scale pretrained language models is still prohibitively expensive and insufficient cross-modal interaction also hinders performance.Therefore,a low-rank cross-modal Transformer(LRCMT)is proposed to address these limitations.Inspired by the low-rank parameter updates exhibited by large pretrained models adapting to natural language tasks,LRCMT injects trainable low-rank matrices into frozen layers,significantly reducing trainable parameters while allowing dynamic word representations.Moreover,a cross-modal modules is designed where visual and speech modalities interact before fusing with the text.Extensive experiments on benchmarks demonstrate LRCMT's efficiency and effectiveness,achieving comparable or better performance than full finetuning by only tuning~0.76%parameters.Furthermore,it also obtains state-of-the-art or competitive results on multiple metrics.Ablations validate that low-rank fine-tuning and sufficient cross-modal interaction contribute to LRCMT's strong performance.This paper reduces the fine-tuning cost and provides insights into efficient and effective cross-modal fusion.

关 键 词:多模态 情感分析 预训练语言模型 跨模态Transformer 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象