深度双模态源域对称迁移学习的跨模态检索

Deep bi-modal source domain symmetrical transfer learning for cross-modal retrieval

作　　者：刘秋杰万源[1] 吴杰 LIU Qiujie;WAN Yuan;WU Jie(School of Science,Wuhan University of Technology,Wuhan Hubei 430070,China)

机构地区：[1]武汉理工大学理学院,武汉430070

出　　处：《计算机应用》2024年第1期24-31,共8页journal of Computer Applications

基　　金：中央高校基本科研业务费专项资金资助项目(2021Ⅲ030JC)

摘　　要：基于深度网络的跨模态检索经常面临交叉训练数据不足的挑战,这限制了训练效果并容易导致过拟合。迁移学习在源域中训练数据的知识迁移学习到目标域中,能有效解决训练数据不足的问题。然而,现有的大部分迁移学习方法致力于将知识从单模态(如图像)源域迁移到多模态(如图像和文本)目标域,而如果源域中已存在多种模态信息,这样的非对称迁移会忽略源域中包含的潜在的模态间语义信息;同时这些方法不能很好地提取源域与目标域中相同模态的相似性,进而减小域差异。因此,提出一种深度双模态源域对称迁移学习的跨模态检索(DBSTL)方法。该方法旨在实现从双模态源域到跨模态目标域的知识迁移,并获得跨模态数据的公共表示。DBSTL由模态对称迁移子网和语义一致性学习子网构成。模态对称迁移子网采用混合对称结构,在知识迁移过程中,使模态间信息具有更高的一致性,并能减小源域与目标域间的差异;而语义一致性学习子网中,所有模态共享相同的公共表示层,并在目标域的监督信息指导下保证跨模态语义的一致性。实验结果表明,在Pascal、NUS-WIDE-10k和Wikipedia数据集上,所提方法的平均精度均值(mAP)较对比方法得到的最好结果分别提升了大约8.4、0.4和1.2个百分点。DBSTL充分利用了双模态源域的潜在信息进行对称迁移学习,在监督信息的指导下保证了模态间语义的一致性,并提高了公共表示空间中图像文本分布的相似性。Cross-modal retrieval based on deep network often faces the challenge of insufficient cross-training data,which limits the training effect and easily leads to over-fitting.Transfer learning is an effective way to solve the problem of insufficient training data by learning the training data in the source domain and transferring the acquired knowledge to the target domain.However,most of the existing transfer learning methods focus on transferring knowledge from single-modal(like image)source domain to cross-modal(like image and text)target domain.If there is multiple modal information in the source domain,this asymmetric transfer would ignore the potential inter-modal semantic information contained in the source domain.At the same time,the similarity of the same modals in the source domain and the target domain cannot be well extracted,thereby reducing the domain difference.Therefore,a Deep Bi-modal source domain Symmetrical Transfer Learning for cross-modal retrieval(DBSTL)method was proposed.The purpose of this method is to realize the knowledge transfer from bi-modal source domain to multi-modal target domain,and obtain the common representation of cross-modal data.DBSTL consists of modal symmetric transfer subnet and semantic consistency learning subnet.With hybrid symmetric structure adopted in symmetric modal transfer subnet,the information between modals was more consistent to each other and the difference between source domain and target domain was reduced by this subnet.In semantic consistency learning subnet,all modalities shared the same common presentation layer,and the cross-modal semantic consistency was ensured under the guidance of the supervision information of the target domain.Experimental results show that on Pascal,NUS-WIDE-10k and Wikipedia datasets,the mean Average Precision(mAP)of the proposed method is improved by about 8.4,0.4 and 1.2 percentage points compared with the best result obtained by the comparison methods respectively.DBSTL makes full use of the potential information of the dua

关键词：跨模态检索迁移学习双模态源域语义一致性

分类号：TP391.3[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度双模态源域对称迁移学习的跨模态检索

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度双模态源域对称迁移学习的跨模态检索

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索