基于自监督增强特征的直推式零样本图像分类  

Transductive zero-shot image classification based on self-supervised enhancement feature

在线阅读下载全文

作  者:王浩宇 张欣然 王雪松 程玉虎 WANG Hao-yu;ZHANG Xin-ran;WANG Xue-song;CHENG Yu-hu(School of Information and Control Engineering,China University of Mining and Technology,Xuzhou 221116,China)

机构地区:[1]中国矿业大学信息与控制工程学院,江苏徐州221116

出  处:《控制与决策》2024年第5期1707-1717,共11页Control and Decision

基  金:国家自然科学基金项目(62176259,61976215);江苏省自然科学基金项目(BK20221116);江苏省卓越博士后计划项目(2022ZB530)。

摘  要:图像的视觉特征对实现零样本图像分类有至关重要的作用.尽管目前VGG、GoogLeNet和ResNet等网络提取的深度特征在图像分类领域获得了广泛的应用,但其在零样本图像分类问题上的表现并不理想,仍旧存在较大的提升空间.此外,由于零样本学习场景下训练集与测试集不相交的设定,导致分类网络不可避免地存在领域偏移问题.为此,提出一种基于自监督增强特征的直推式零样本图像分类框架.首先,通过辅助任务构造伪标签,利用自监督学习获得图像的自监督特征并将其与无监督深度特征进行特征融合;然后,将融合特征嵌入语义空间中进行零样本图像分类,并获得未见类的初始预测标签;最后,利用未见类特征和预测标签迭代地优化视觉-语义映射.所提出框架组件可选择,框架组件自监督网络、主干网络和降维网络分别选用CFN、VGG16和PCA构成网络.在CUB、SUN和AwA2数据集上的实验结果表明,所提出网络能够增强特征的判别能力,在零样本图像分类问题上表现良好.The visual features of images play a crucial role in realizing zero-shot image classification.Although the deep features extracted by networks such as VGG,GoogLeNet,and ResNet have been widely used in the field of image classification,their performance in zero-shot image classification is not ideal.In addition,due to the disjoint setting of the training and testing sets under the zero-shot learning scenario,the classification network inevitably suffers from the problem of domain shift.Therefor,a transductive zero-shot image classification framework based on self-supervised enhancement feature is proposed.The main idea is as follows:first,the pseudo-labels are constructed via the auxiliary task,the self-supervised features of images are obtained by using the self-supervised learning and are further fused with the unsupervised deep features;then,the fused features are embedded in the semantic space for zero-shot image classification,thus the initial predicted labels for unseen classes are obtained;finally,the features and predicted labels of unseen classes are adopted to iteratively optimize the visual-semantic mapping.The framework components proposed can be selected.The framework components self-supervised network,backbone network and reduced-dimension network are CFN,VGG16 and PCA respectively.Experiments on CUB,SUN,and AwA2 datasets show that the proposed network can enhance the discriminative capability of features and perform well on zero-shot image classification tasks.

关 键 词:零样本学习 自监督学习 直推式 视觉-语义映射 特征融合 图像分类 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象