基于Swin Transformer的深度有监督哈希图像检索方法  被引量:7

Deep Supervised Hashing Image Retrieval Method Based on Swin Transformer

在线阅读下载全文

作  者:苗壮 赵昕昕 李阳 王家宝 张睿 MIAO Zhuang;ZHAO Xinxin;LI Yang;WANG Jiabao;ZHANG Rui(Command and Control Engineering College,Army Engineering University of PLA,Nanjing 210007,China)

机构地区:[1]陆军工程大学指挥控制工程学院,江苏南京210007

出  处:《湖南大学学报(自然科学版)》2023年第8期62-71,共10页Journal of Hunan University:Natural Sciences

基  金:国家自然科学基金资助项目(61806220);国家重点研发计划项目(2017YFC0821905)。

摘  要:在深度有监督哈希图像检索的特征提取过程中,一直由卷积神经网络架构主导,但是随着Transformer在视觉领域中的应用,Transformer替代卷积神经网络架构成为可能.为了解决现存基于Transformer的哈希方法中不能生成层次表示和计算复杂度高等问题,提出了一种基于Swin Transformer的深度有监督哈希图像检索方法.该方法以Swin Transformer网络模型为基础,在网络最后添加一个哈希层,为图像进行哈希编码.该模型中引入了局部思想和层级结构,能够有效解决上述问题.与现有的13种先进方法相比,所提方法的哈希检索性能得到大幅提升.在两个常用检索数据集CIFAR-10和NUS-WIDE上进行实验,实验结果表明:在CIFAR-10数据集上所提方法mAP最高达到98.4%,与TransHash方法相比平均提高7.1%,与VTS16-CSQ方法相比平均提高0.57%;在NUS-WIDE数据集上所提方法mAP最高达到93.6%,与TransHash方法相比平均提高18.61%,与VTS16-CSQ方法相比检索精度平均提高8.6%.The feature extraction process in deep supervised Hash image retrieval has been dominated by the convolutional neural network architecture.However,with the application of Transformer in the field of vision,it becomes possible to replace the convolutional neural network architecture with Transformer.In order to address the limitations of existing Transformer-based hashing methods,such as the inability to generate hierarchical representations and high computational complexity,a deep supervised hash image retrieval method based on Swin Transformer is proposed.The proposed method utilizes the Swin Transformer network model,and incorporates a hash layer at the end of the network to generate hash encode for images.By introducing the concepts of locality and hierarchy into the model,the method effectively solve the above problems.Compared with 13 existing state-of-the-art methods,the method proposed in this paper has greatly improved the performance of hash retrieval.Experiments are carried out on two commonly used retrieval datasets,namely CIFAR-10 and NUS-WIDE.The experimental results show that the proposed method achieves the highest mean average precision(mAP)of 98.4%on the CIFAR-10 dataset.This represents an average increase of 7.1%compared with the TransHash method and an average increase of 0.57%compared with the VTS16-CSQ method.On the NUS-WIDE dataset,the proposed method achieves the highest mAP of 93.6%.This corresponds to an average improvement of 18.61%compared with the TransHash method,and an average increase of 8.6%in retrieval accuracy compared with the VTS16-CSQ method.

关 键 词:哈希学习 深度学习 图像检索 Swin Transformer 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象