用于跨模态舰船图像检索的判别性对抗哈希变换器  被引量:2

Discriminant Adversarial Hashing Transformer for Cross-modal Vessel Image Retrieval

在线阅读下载全文

作  者:关欣 国佳恩 卢雨 GUAN Xin;GUO Jiaen;LU Yu(Naval Aviation University,Ysntai 264001,China;Unit 91422 of the PLA,Yantai 265200,China)

机构地区:[1]海军航空大学,烟台264001 [2]中国人民解放军91422部队,烟台265200

出  处:《电子与信息学报》2023年第12期4411-4420,共10页Journal of Electronics & Information Technology

基  金:泰山学者工程专项经费(ts 201712072);国防科技卓越青年科学基金(2017-JCJQ-ZQ-003)。

摘  要:针对当前主流的基于卷积神经网络(CNN)范式的跨模态图像检索算法无法有效提取舰船图像细节特征,以及跨模态“异构鸿沟”难以消除等问题,该文提出一种基于对抗机制的判别性哈希变换器(DAHT)用于舰船图像的跨模态快速检索。该网络采用双流视觉变换器(ViT)结构,依托ViT的自注意力机制进行舰船图像的判别性特征提取,并设计了Hash Token结构用于哈希生成;为了消除同类别图像的跨模态差异,整个检索框架以一种对抗的方式进行训练,通过对生成哈希码进行模态辨别实现模态混淆;同时设计了一种基于反馈机制的跨模加权5元组损失(NW-DCQL)以保持网络对不同类别图像的语义区分性。在两组数据集上开展的4类跨模态检索实验中,该文方法相比次优检索结果分别取得了9.8%,5.2%,19.7%,21.6%的性能提升(32 bit),在单模态检索任务中亦具备一定的性能优势。In view of the problems that the current mainstream cross-modal image retrieval algorithm based on Convolutional Neural Network(CNN)paradigm can not extract details of ship images effectively,and the cross-modal“heterogeneous gap”is difficult to eliminate,a Discriminant Adversarial Hash Transformer(DAHT)is proposed for fast cross-modal retrieval of ship images.The network adopts dual-stream Vision Transformer(ViT)structure and relies on the self-attention mechanism of ViT to extract the discriminant features of ship images.Based on this,a Hash Token structure is designed for Hash generation.In order to eliminate the cross-modal difference of the same category image,the whole retrieval framework is trained in an adversarial way,and modal confusion is realized by modal discrimination of generated Hash codes.At the same time,a Normalized discounted cumulative gain Weighting based Discriminant Cross-modal Quintuplet Loss(NW-DCQL)is designed to maintain the semantic discrimination of different types of images.In the four types of cross-modal retrieval tasks carried out on two datasets,the proposed method achieves 9.8%,5.2%,19.7%,and 21.6%performance improvement compared with the suboptimal retrieval results(32 bit),and also has certain performance advantages in unimodal retrieval tasks.

关 键 词:跨模态检索 舰船图像 对抗训练 哈希变换 变换器 

分 类 号:TN913[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象