检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘秋杰 万源[1] 吴杰 LIU Qiujie;WAN Yuan;WU Jie(School of Science,Wuhan University of Technology,Wuhan Hubei 430070,China)
出 处:《计算机应用》2024年第1期24-31,共8页journal of Computer Applications
基 金:中央高校基本科研业务费专项资金资助项目(2021Ⅲ030JC)
摘 要:基于深度网络的跨模态检索经常面临交叉训练数据不足的挑战,这限制了训练效果并容易导致过拟合。迁移学习在源域中训练数据的知识迁移学习到目标域中,能有效解决训练数据不足的问题。然而,现有的大部分迁移学习方法致力于将知识从单模态(如图像)源域迁移到多模态(如图像和文本)目标域,而如果源域中已存在多种模态信息,这样的非对称迁移会忽略源域中包含的潜在的模态间语义信息;同时这些方法不能很好地提取源域与目标域中相同模态的相似性,进而减小域差异。因此,提出一种深度双模态源域对称迁移学习的跨模态检索(DBSTL)方法。该方法旨在实现从双模态源域到跨模态目标域的知识迁移,并获得跨模态数据的公共表示。DBSTL由模态对称迁移子网和语义一致性学习子网构成。模态对称迁移子网采用混合对称结构,在知识迁移过程中,使模态间信息具有更高的一致性,并能减小源域与目标域间的差异;而语义一致性学习子网中,所有模态共享相同的公共表示层,并在目标域的监督信息指导下保证跨模态语义的一致性。实验结果表明,在Pascal、NUS-WIDE-10k和Wikipedia数据集上,所提方法的平均精度均值(mAP)较对比方法得到的最好结果分别提升了大约8.4、0.4和1.2个百分点。DBSTL充分利用了双模态源域的潜在信息进行对称迁移学习,在监督信息的指导下保证了模态间语义的一致性,并提高了公共表示空间中图像文本分布的相似性。Cross-modal retrieval based on deep network often faces the challenge of insufficient cross-training data,which limits the training effect and easily leads to over-fitting.Transfer learning is an effective way to solve the problem of insufficient training data by learning the training data in the source domain and transferring the acquired knowledge to the target domain.However,most of the existing transfer learning methods focus on transferring knowledge from single-modal(like image)source domain to cross-modal(like image and text)target domain.If there is multiple modal information in the source domain,this asymmetric transfer would ignore the potential inter-modal semantic information contained in the source domain.At the same time,the similarity of the same modals in the source domain and the target domain cannot be well extracted,thereby reducing the domain difference.Therefore,a Deep Bi-modal source domain Symmetrical Transfer Learning for cross-modal retrieval(DBSTL)method was proposed.The purpose of this method is to realize the knowledge transfer from bi-modal source domain to multi-modal target domain,and obtain the common representation of cross-modal data.DBSTL consists of modal symmetric transfer subnet and semantic consistency learning subnet.With hybrid symmetric structure adopted in symmetric modal transfer subnet,the information between modals was more consistent to each other and the difference between source domain and target domain was reduced by this subnet.In semantic consistency learning subnet,all modalities shared the same common presentation layer,and the cross-modal semantic consistency was ensured under the guidance of the supervision information of the target domain.Experimental results show that on Pascal,NUS-WIDE-10k and Wikipedia datasets,the mean Average Precision(mAP)of the proposed method is improved by about 8.4,0.4 and 1.2 percentage points compared with the best result obtained by the comparison methods respectively.DBSTL makes full use of the potential information of the dua
分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.136.109