基于全局注意力的正交融合图像描述符

Orthogonal fusion image descriptor based on global attention

作　　者：艾列富陶勇蒋常玉 AI Liefu;TAO Yong;JIANG Changyu(School of Computer and Information,Anqing Normal University,Anqing Anhui 246133,China;School of Smart Transportation Modern Industry,Anhui Sanlian University,Hefei Anhui 230601,China)

机构地区：[1]安庆师范大学计算机与信息学院,安徽安庆246133 [2]安徽三联学院智慧交通现代产业学院,安徽合肥230601

出　　处：《图学学报》2024年第3期472-481,共10页Journal of Graphics

基　　金：安徽省自然科学基金项目(1608085MF144,1908085MF194);安徽省高校自然科学研究重点项目(KJ2020A0498)。

摘　　要：图像描述符是计算机视觉任务重要研究对象,被广泛应用于图像分类、分割、识别与检索等领域。深度图像描述符在局部特征提取分支缺少高维特征的空间与通道信息的关联性,导致局部特征表达的信息不充分。为此,提出一种融合局部、全局特征的图像描述符,在局部特征提取分支进行膨胀卷积提取多尺度特征图,输出的特征拼接后经过含有多层感知器的全局注意力机制捕捉具有关联性的通道-空间信息,再加工后输出最终的局部特征;高维的全局分支经过全局池化和全卷积生成全局特征向量;提取局部特征在全局特征向量上的正交值与全局特征串联后聚合形成最终的描述符。同时,在特征约束方面,使用包含子类心的角域度损失函数增大模型在大规模数据集的鲁棒性。在国际公开数据集Roxford5k和Rparis6k上进行实验,所提出描述符的平均检索精度在medium和hard模式分别为81.87%和59.74%以及91.61%和79.12%,比深度正交融合描述符分别提升了1.70%,1.56%,2.00%和1.83%,较其他图像描述符具有更好的检索精度。Image descriptors are important research objects in computer vision tasks and are widely applied to the fields of image classification,segmentation,recognition,and retrieval.The depth image descriptor lacks the correlation between the high-dimensional feature space and channel information in the local feature extraction branch,resulting in insufficient information for local feature expression.Therefore,an image descriptor combining local and global features was proposed.The multi-scale feature map was extracted through dilated convolution in the local feature extraction branch.After the output features were spliced,the relevant channel-space information was captured through a global attention mechanism with a multilayer perceptron.Then the final local features were output after processing.The high-dimensional global branches generated global feature vectors through global pooling and full convolution.The orthogonal values of local features were extracted on the global feature vector,and were then concatenated with the global features to form the final descriptor.At the same time,the robustness of the model in large-scale datasets were enhanced by employing the angular domain loss function containing the sub-class center.The experimental results on the publicly available datasets Roxford5k and Rparis6k demonstrated that in medium and hard modes,the average retrieval accuracy of this descriptor reached 81.87%and 59.74%,and 91.61%and 79.12%,respectively.This represented an improvement of 1.70%and 1.56%,and 2.00%and 1.83%compared to that of deep orthogonal fusion descriptors.It exhibited superior retrieval accuracy over other image descriptors.

关键词：图像描述符膨胀卷积全局注意力特征融合子类心角度域损失

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于全局注意力的正交融合图像描述符

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于全局注意力的正交融合图像描述符

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索