基于跨尺度Vision Transformer的深度哈希算法

Deep hashing method based on cross-scale Vision Transformer

作　　者：姚佩昀于炯[2] 李雪[2] 李梓杨陈鹏程 Yao Peiyun;Yu Jiong;Li Xue;Li Ziyang;Chen Pengcheng(School of Software,Xinjiang University,Urümqi 830046,China;School of Computer Science&Technology,Xinjiang University,Urümqi 830046,China)

机构地区：[1]新疆大学软件学院,乌鲁木齐830046 [2]新疆大学计算机科学与技术学院,乌鲁木齐830046

出　　处：《计算机应用研究》2024年第11期3477-3483,共7页Application Research of Computers

基　　金：国家自然科学基金资助项目(62262064,62266043,61966035);新疆维吾尔自治区重点研发项目(2022295358);新疆维吾尔自治区自然科学基金资助项目(2022D01C56);新疆大学博士研究生创新项目(XJU2022BS072)。

摘　　要：为了解决当前深度哈希算法提取跨尺度特征能力不足以及难以拟合数据的全局相似度分布问题,提出了一种基于跨尺度Vision Transformer的深度哈希算法。首先,利用金字塔卷积和跨尺度注意力机制构建了一种多层次编码器,来捕获图像丰富的语义信息;其次,提出了一种基于代理的深度哈希算法,该算法为每个类别生成哈希代理,使得哈希码可以学习具有鉴别性的类别特征,从而缩小与同类别哈希代理的距离并拟合数据全局相似性分布;最后,在哈希代理与哈希码之间添加角度边距项,扩大类内相似性和类间差异性,以生成具有高判别性的哈希码。通过在CIFAR-10、ImageNet-100、NUS-Wide、MS COCO上进行的实验结果表明,该算法的平均检索精度比次优方法分别提升4.42%、19.61%、0.35%、15.03%,验证了该算法的有效性。To solve the problems of insufficient ability of current deep hashing algorithms to extract cross-scale features and difficulty in fitting the global similarity distribution of data,this paper proposed a deep hashing method based on cross-scale Vision Transformer.Firstly,the method utilized pyramid convolution and cross-scale attention mechanism to construct a multi-level encoder to capture the rich semantic information of the image.Secondly,the method proposed a proxy based deep hashing algorithm.This algorithm generated hash proxies for each category,allowing hash codes to learn discriminative class features to reduce the distance from hash proxies of the same category and fit the global similarity distribution of the data.Finally,the method added an angle margin term between the hash proxy and the hash code to expand intra class similarity and inter class differences to generate hash codes with high discriminability.The experimental results conducted on CIFAR-10,ImageNet-100,NUS Wide,and MS COCO show that the average retrieval accuracy of the algorithm is 4.42%,19.61%,0.35%,and 15.03%higher than the suboptimal method,respectively,demonstrating the effectiveness of the algorithm.

关键词：深度哈希视觉注意力哈希代理跨尺度图像检索

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于跨尺度Vision Transformer的深度哈希算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于跨尺度Vision Transformer的深度哈希算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索