UniTrans:Unified Parameter-Efficient Transfer Learning and Multimodal Alignment for Large Multimodal Foundation Model  

在线阅读下载全文

作  者:Jiakang Sun Ke Chen Xinyang He Xu Liu Ke Li Cheng Peng 

机构地区:[1]Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu,610213,China [2]School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing,101499,China

出  处:《Computers, Materials & Continua》2025年第4期219-238,共20页计算机、材料和连续体(英文)

摘  要:With the advancements in parameter-efficient transfer learning techniques,it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions.However,applying this technique to multimodal knowledge transfer introduces a significant challenge:ensuring alignment across modalities while minimizing the number of additional parameters required for downstream task adaptation.This paper introduces UniTrans,a framework aimed at facilitating efficient knowledge transfer across multiple modalities.UniTrans leverages Vector-based Cross-modal Random Matrix Adaptation to enable fine-tuning with minimal parameter overhead.To further enhance modality alignment,we introduce two key components:the Multimodal Consistency Alignment Module and the Query-Augmentation Side Network,specifically optimized for scenarios with extremely limited trainable parameters.Extensive evaluations on various cross-modal downstream tasks demonstrate that our approach surpasses state-of-the-art methods while using just 5%of their trainable parameters.Additionally,it achieves superior performance compared to fully fine-tuned models on certain benchmarks.

关 键 词:Parameter-efficient transfer learning multimodal alignment image captioning image-text retrieval visual question answering 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象