多分支多尺度的自注意力细粒度图像分类算法  

Multi-branch and Multi-scale Self-attention Learning for Fine-grained Visual Categorization

在线阅读下载全文

作  者:张峰 王高才[1] ZHANG Feng;WANG Gao-cai(School of Computer and Electronic Information,Guangxi University,Nanning 530000,China)

机构地区:[1]广西大学计算机与电子信息学院,南宁530000

出  处:《小型微型计算机系统》2023年第12期2784-2790,共7页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(62062007)资助。

摘  要:细粒度视觉分类(FGVC)是计算机视觉的一个重要的研究分支,但是由于细粒度分类任务中图片由于变形,遮挡,光照差异等引起的同种类之间差异大和不同种类之间差异小的原因,使得它成为一项十分具有挑战性的任务.本篇论文通过改进MMAL-net(Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization)算法以细粒度视觉分类的问题.本文的方法使用注意对象定位模块(ALOM)预测对象在图片中的位置,注意力部分建议模块(APPM)以在不需要边框或部分标注的情况下提出信息丰富的部分区域.得到的目标图像不仅包含了目标的几乎整个结构,而且包含了更多的细节,部分图像具有许多不同的尺度和更细粒度的特征,原始图像包含了完整的目标.三类图像由多分支网络进行监督学习.本文引入注意力机制使用Split-Attention模块对不同分支之间的输出进行权重再分配,并且引入SENet(Squeeze-and-Excitation Networks)使模型关注通道特征.本文的模型对不同尺度的图像具有良好的分类能力与鲁棒性,同时可以端到端进行训练并且有较短的推理时间.通过在CUB200-2011、FGVC-Airline和Stanford Cars数据集上的综合实验表明,本文的方法具有超越MMAL-net的分类性能,并且可以与最好的算法进行比较.Fine-grained Visual Categorization(FGVC)is a very important branch of computer vision.But it is still a challenging task due to high intra-class variances and low inter-class variances caused by deformation,occlusion and illumination,etc.In this paper,an improved Multi-branch and Multi-scale Attention Learning model is proposed for solve the problem of weakly supervised fine-grained visual classification better.The attention object location module(AOLM)can predict the position of the object and attention part proposal module(APPM)can propose informative part regions without the need of bounding-box or part annotations.The resulting image not only contains almost all the structure of the object but also contains more details,part images have many different scales and more fine-grained features,and the raw images contain the complete object.Three types of images are supervised by our multi-branch network.Attention mechanism is introduced into our model,and the Split-Attention module is used to redistribute the weights of the outputs of different branches.What′s more,the method of SENet(Squeeze-and-Excitation Networks)is introduced into our model to keep it channel-focused.Our model has good classification ability and robustness for images of different scales.Our approach can be trained end-to-end,while provides short inference time.Through the comprehensive experiments demonstrate that our approach has performance comparable to state-of-the-art results on CUB-200-2011,FGVC-Aircraft and Stanford Cars datasets.

关 键 词:细粒度视觉分类 弱监督学习 注意力机制 Split-Attention SENet 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象