联合判别区域特征的细粒度视觉分类方法  

Fine Grained Visual Classification Method for Combined Discriminative Region Features

在线阅读下载全文

作  者:康宇 郝晓丽[1] KANG Yu;HAO Xiaoi(School of Computer Science and Technology,Taiyuan University of Technology,Taiyuan 030000,China)

机构地区:[1]太原理工大学计算机科学与技术学院,太原030000

出  处:《计算机工程与应用》2025年第2期227-233,共7页Computer Engineering and Applications

基  金:国家自然科学基金面上项目(62072326)。

摘  要:细粒度视觉分类方法的核心是定位图像中的判别区域。现有研究通过利用与改进视觉Transformer方法增强了判别区域特征的远距离依赖关系,但是大多数方法仅局限于增强显著判别区域的注意力,忽略了次显著的判别区域中可以联合提取的特征信息,导致具有相似局部特征的不同类别区分难度大,分类准确率较低。因此,提出了一种联合判别区域的提取特征方法,在自注意力模块的前端划分特征图的候选判别区域,引导模型提取不同显著程度的判别区域特征;通过双线性融合自注意力模块对多个不同显著程度的判别区域进行联合特征的提取,获取更加全面的判别区域特征信息。实验结果表明,引入联合判别区域方法的视觉Transformer网络在CUB-200-2011数据集上的准确率达92.7%,较标准视觉Transformer方法提升了2.4个百分点,并且在其余的基准数据集上均超越了当前最优的细粒度视觉分类方法。The core of the fine-grained visual classification method is to locate the discriminant region in the image.The existing studies have enhanced the long-distance dependence of discriminant regional features by using and improving the vision Transformer method,but most of the methods are only limited to enhancing the attention of the salient discriminant region,ignoring the feature information that can be jointly extracted in the sub-significant discriminant region,which makes it difficult to distinguish different categories with similar local features and has low classification accuracy.There-fore,this paper proposes a joint discriminant region extraction method.Firstly,the candidate discriminant regions of the feature map are divided at the front end of the self-attention module,and the model is guided to extract the discriminant region features with different degrees of significance.Secondly,the bilinear fusion self-attention module is used to extract the joint features of multiple discriminant regions with different degrees of significance,so as to obtain more comprehen-sive discriminant region feature information.Experimental results show that the accuracy of the vision Transformer net-work with the joint discriminant region method on the CUB-200-2011 dataset is 92.7%,which is 2.4 percentage pionts higher than that of the standard vision Transformer method,and surpasses the current optimal fine-grained visual classifi-cation method on the other benchmark datasets.

关 键 词:细粒度视觉分类 判别区域 视觉Transformer 自注意力机制 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程] TP391.41[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象