融合目标定位与异构局部交互学习的细粒度图像分类  

Fine-grained Image Classification by Integrating Object Localization and Heterogeneous Local Interactive Learning

在线阅读下载全文

作  者:陈权 陈飞[1] 王衍根 程航[2] 王美清[2] CHEN Quan;CHEN Fei;WANG Yan-Gen;CHENG Hang;WANG Mei-Qing(College of Computer and Data Science,Fuzhou University,Fuzhou 350108;School of Mathematics and Statistics,Fuzhou University,Fuzhou 350108)

机构地区:[1]福州大学计算机与大数据学院,福州350108 [2]福州大学数学与统计学院,福州350108

出  处:《自动化学报》2024年第11期2219-2230,共12页Acta Automatica Sinica

基  金:国家自然科学基金(61771141,62172098);福建省自然科学基金(2021J01620)资助。

摘  要:由于细粒度图像之间存在小的类间方差和大的类内差异,现有分类算法仅仅聚焦于单张图像的显著局部特征的提取与表示学习,忽视了多张图像之间局部的异构语义判别信息,较难关注到区分不同类别的微小细节,导致学习到的特征缺乏足够区分度.本文提出了一种渐进式网络以弱监督的方式学习图像不同粒度层级的信息.首先,构建一个注意力累计目标定位模块(Attention accumulation object localization module,AAOLM),在单张图像上从不同的训练轮次和特征提取阶段对注意力信息进行语义目标集成定位.其次,设计一个多张图像异构局部交互图模块(Heterogeneous local interactive graph module,HLIGM),提取每张图像的显著性局部区域特征,在类别标签引导下构建多张图像的局部区域特征之间的图网络,聚合局部特征增强表示的判别力.最后,利用知识蒸馏将异构局部交互图模块产生的优化信息反馈给主干网络,从而能够直接提取具有较强区分度的特征,避免了在测试阶段建图的计算开销.通过在多个数据集上进行的实验,证明了提出方法的有效性,能够提高细粒度分类的精度.Due to the existence of small inter-class differences and large intra-class variance among fine-grained images,the existing classification algorithms only focus on the extraction and representation learning of salient local features of a single image,ignoring the local heterogeneous semantic discrimination information between multiple images,difficult to pay attention to the subtle details that distinguish different categories,resulting in the lack of sufficient discrimination of the learned features.This paper proposes a progressive network to learn the information of different granularity levels of the image in a weakly supervised manner.First,attention accumulation object localization module(AAOLM)is constructed to perform semantic target integration localization on attention information from different training epochs and feature extraction stages on a single image.Second,a multi-image heterogeneous local interactive graph module(HLIGM)is designed to construct a graph network and aggregate information between the local region features of multiple images under the guidance of the category label after extracting the salient local region features of each image to enhance the discriminative power of the representation.Finally,the optimization information generated by HLIGM is fed back to the backbone by using knowledge distillation so that it can directly extract features with strong discrimination,avoiding the computational overhead of building the graph in the test phase.Through experiments on multiple data sets,it proves the effectiveness of the proposed method,which can improve the fine-grained classification accuracy.

关 键 词:深度学习 细粒度图像分类 弱监督目标定位 图神经网络 知识蒸馏 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象