双模态域无关提示引导的图像分类域适应  

Dual-modality domain-agnostic prompts guided cross-domain image classification

作  者:许媛媛 阚美娜 山世光[1,2,3] 陈熙霖 Xu Yuanyuan;Kan Meina;Shan Shiguang;Chen Xilin(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China;Peng Cheng Laboratory,Shenzhen 518055,China)

机构地区:[1]中国科学院计算技术研究所,北京100190 [2]中国科学院大学计算机科学与技术学院,北京100049 [3]鹏城实验室,深圳518055

出  处:《中国图象图形学报》2025年第2期503-517,共15页Journal of Image and Graphics

基  金:国家自然科学基金项目(62122074);中国科学院计算技术研究所创新课题项目(E201140)。

摘  要:目的域适应技术旨在利用有标签的源域信息提升无标签目标域上的任务性能。近期,对比语言—图像预训练模型CLIP(contrastive language-image pre-training)展现出了强大的泛化能力,一些研究将其引入到域适应中,以提升模型在目标域上的泛化能力。然而,目前基于CLIP的域适应方法通常只调整文本模态的特征,保持视觉模态的特征不变,从而导致目标域的性能提升受限。为此,提出了双模态域无关提示引导的图像分类域适应方法DDAPs(dualmodality domain-agnostic prompts)。方法DDAPs引入了双模态提示学习,即通过文本和视觉提示学习微调文本特征和图像特征,协同处理域差异的问题。一方面,DDAPs致力于学习更具判别性的文本和图像特征,使模型在当前下游分类任务上的性能更好;另一方面,DDAPs通过消除源域和目标域之间的域差异,学习域不变的文本和图像特征,以提升模型在目标域上的性能。以上两个目标可通过添加域无关文本提示模块和域无关视觉提示模块,使用分类损失和对齐损失微调CLIP来实现。对于分类损失,DDAPs利用源域的标签和目标域的伪标签对样本进行分类;而对于对齐损失,DDAPs则通过最大均值差异损失(maximum mean discrepancy,MMD)来对齐源域和目标域的图像特征分布,从而消除图像特征的域差异。结果本方法既适用于单源域适应,也适用于多源域适应。对于单源域适应,本方法在Office-Home、VisDa-2017及Office-31这3个数据集上进行了实验,分别取得了87.1%、89.6%和91.6%的平均分类准确率,达到了当前最好的性能;对于多源域适应,本方法在Office-Home上进行了实验,取得了88.6%的平均分类准确率。同时,在Office-Home上进行了消融实验,验证了域无关文本提示模块和域无关视觉提示模块的有效性。结论DDAPs通过域无关的文本和视觉提示模块微调CLIP预训练模型,使模型学习源域与目标域之间Objective Domain adaptation aims to utilize information from a labeled source domain to assist tasks in the unlabeled target domain.Recently,contrastive language-image pre-training(CLIP)has demonstrated impressive generaliza⁃tion capabilities in classification downstream tasks.Some methods have incorporated CLIP into domain adaptation,enhanc⁃ing the model’s generalization ability in the target domain.However,current domain adaptation methods based on CLIP typically adjust only the features of the textual modality,leaving the visual modality features unchanged.These existing methods overlook the importance of enhancing the discriminative capability of image features during classification and neglect the synergistic role of the visual modality in eliminating domain discrepancy.This issue is addressed by introducing a domain adaptation method for the image classification task guided by dual-modality domain-agnostic prompts(DDAPs).Method DDAPs introduces dual-modality prompt learning,simultaneously fine-tunes textual and visual features,and col⁃laboratively addresses domain discrepancies.The key modules of DDAPs are the domain-agnostic textual prompt module and the domain-agnostic visual prompt module.The former employs textual prompt learning techniques to fine-tune the text encoder,fostering domain-agnostic and discriminative text features across domains.DDAPs adopts task-level text prompt learning,sharing the textual prompt module across various domains and categories.Similarly,the domain-agnostic visual prompt module uses visual prompt learning techniques to enhance the image encoder,cultivating domain-agnostic and dis⁃criminative image features.Task-level visual prompt learning is employed,ensuring that the visual prompt module is shared across diverse domains and samples.The added DDAPs were learned via classification loss and alignment loss to fine-tune the model.On the one hand,as the original pre-training task for CLIP involves matching paired images and text,it needs to learn more discriminative t

关 键 词:单源域适应 多源域适应 域适应 迁移学习 双模态提示学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象