多分支多尺度的自注意力细粒度图像分类算法

Multi-branch and Multi-scale Self-attention Learning for Fine-grained Visual Categorization

作　　者：张峰王高才[1] ZHANG Feng;WANG Gao-cai(School of Computer and Electronic Information,Guangxi University,Nanning 530000,China)

机构地区：[1]广西大学计算机与电子信息学院,南宁530000

出　　处：《小型微型计算机系统》2023年第12期2784-2790,共7页Journal of Chinese Computer Systems

基　　金：国家自然科学基金项目(62062007)资助。

摘　　要：细粒度视觉分类(FGVC)是计算机视觉的一个重要的研究分支,但是由于细粒度分类任务中图片由于变形,遮挡,光照差异等引起的同种类之间差异大和不同种类之间差异小的原因,使得它成为一项十分具有挑战性的任务.本篇论文通过改进MMAL-net(Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization)算法以细粒度视觉分类的问题.本文的方法使用注意对象定位模块(ALOM)预测对象在图片中的位置,注意力部分建议模块(APPM)以在不需要边框或部分标注的情况下提出信息丰富的部分区域.得到的目标图像不仅包含了目标的几乎整个结构,而且包含了更多的细节,部分图像具有许多不同的尺度和更细粒度的特征,原始图像包含了完整的目标.三类图像由多分支网络进行监督学习.本文引入注意力机制使用Split-Attention模块对不同分支之间的输出进行权重再分配,并且引入SENet(Squeeze-and-Excitation Networks)使模型关注通道特征.本文的模型对不同尺度的图像具有良好的分类能力与鲁棒性,同时可以端到端进行训练并且有较短的推理时间.通过在CUB200-2011、FGVC-Airline和Stanford Cars数据集上的综合实验表明,本文的方法具有超越MMAL-net的分类性能,并且可以与最好的算法进行比较.Fine-grained Visual Categorization(FGVC)is a very important branch of computer vision.But it is still a challenging task due to high intra-class variances and low inter-class variances caused by deformation,occlusion and illumination,etc.In this paper,an improved Multi-branch and Multi-scale Attention Learning model is proposed for solve the problem of weakly supervised fine-grained visual classification better.The attention object location module(AOLM)can predict the position of the object and attention part proposal module(APPM)can propose informative part regions without the need of bounding-box or part annotations.The resulting image not only contains almost all the structure of the object but also contains more details,part images have many different scales and more fine-grained features,and the raw images contain the complete object.Three types of images are supervised by our multi-branch network.Attention mechanism is introduced into our model,and the Split-Attention module is used to redistribute the weights of the outputs of different branches.What′s more,the method of SENet(Squeeze-and-Excitation Networks)is introduced into our model to keep it channel-focused.Our model has good classification ability and robustness for images of different scales.Our approach can be trained end-to-end,while provides short inference time.Through the comprehensive experiments demonstrate that our approach has performance comparable to state-of-the-art results on CUB-200-2011,FGVC-Aircraft and Stanford Cars datasets.

关键词：细粒度视觉分类弱监督学习注意力机制 Split-Attention SENet

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多分支多尺度的自注意力细粒度图像分类算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多分支多尺度的自注意力细粒度图像分类算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索