基于迁移知识的跨模态双重哈希  

Cross-Modal Dual Hashing Based on Transfer Knowledge

作  者:钟建奇 林秋斌 曹文明[1] ZHONG Jian-qi;LIN Qiu-bin;CAO Wen-ming(School of Electronic and Information Engineering,Shenzhen University,Shenzhen,Guangdong 518060,China)

机构地区:[1]深圳大学电子与信息工程学院,广东深圳518060

出  处:《电子学报》2025年第1期209-220,共12页Acta Electronica Sinica

基  金:国家自然科学基金(No.617714322);深圳市基础研究基金(No.JCYJ20220531100814033)。

摘  要:随着社交网络的普及和多媒体数据的急剧增长,有效的跨模态检索引起了人们越来越多的关注.由于哈希有效的检索效率和低存储成本,其被广泛用于跨模态检索任务中.然而,这些基于深度学习的跨模态哈希检索方法大多数是利用图像网络和文本网络各自生成对应模态的哈希码,难以获得更加有效的哈希码,无法进一步减小不同模态数据之间的模态鸿沟.为了更好地提高跨模态哈希检索的性能,本文提出了一种基于迁移知识的跨模态双重哈希(Cross-modal Dual Hashing based on Transfer Knowledge,CDHTK).CDHTK通过结合图像网络、知识迁移网络以及文本网络进行跨模态哈希检索任务.对于图像模态,CDHTK融合图像网络和知识迁移网络各自生成的哈希码,进而生成具有判别性的图像哈希码;对于文本模态,CDHTK融合文本网络和知识迁移网络各自生成的哈希码,从而生成有效的文本哈希码.CDHTK通过采用预测标签的交叉熵损失、生成哈希码的联合三元组量化损失以及迁移知识的差分损失来共同优化哈希码的生成过程,从而提高模型的检索效果,在2个常用的数据集(IAPR TC-12,MIR-Flickr 25K)上进行的实验验证了CDHTK的有效性,比当前最先进的跨模态哈希方法(Adaptive Label correlation based asymm Etric Cross-modal Hashing,ALECH)分别高出6.82%和5.13%.With the popularity of social networks and the rapid growth of multimedia data,efficient cross-modal retrieval has attracted more and more attention.Hashing is widely used in cross-modal retrieval tasks due to its high retrieval efficiency and low storage cost.However,most of these deep learning-based cross-modal hashing retrieval methods utilize image networks and text networks to respectively generate corresponding modal hash codes,making it difficult to obtain more efficient hash codes and unable to further reduce the modal gap between different modal data.To better improve the performance of cross-modal hashing retrieval,this paper proposes a cross-modal dual hashing based on transfer knowledge(CDHTK).CDHTK performs cross-modal hashing retrieval tasks by combining an image network,a transfer knowledge network,and a text network.For the image modality,CDHTK combines the hash codes generated separately by the image network and the knowledge transfer network to generate discriminative hash codes.For the text modality,CDHTK fuses the hash codes generated separately by the text network and the knowledge transfer network to generate efficient hash codes.CDHTK employs a combination of cross-entropy loss for label prediction,joint triplet quantization loss for hash code generation,and differential loss for transfer knowledge to jointly optimize the hash code generation process,thereby improving the retrieval performance of the model.Experiments on two commonly used data sets(IAPR TC-12,MIR-Flickr 25K)verified the effectiveness of CDHTK,which outperforms the current state-of-the-art cross-modal hashing method ALECH(Adaptive Label correlation based asymmEtric Cross-modal Hashing)by 6.82%and 5.13%,respectively.

关 键 词:跨模态 图像-文本检索 双重哈希 迁移知识 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象