Cross-modal learning using privileged information for long-tailed image classification  

在线阅读下载全文

作  者:Xiangxian Li Yuze Zheng Haokai Ma Zhuang Qi Xiangxu Meng Lei Meng 

机构地区:[1]School of Software,Shandong University,Jinan 250101,China

出  处:《Computational Visual Media》2024年第5期981-992,共12页计算可视媒体(英文版)

基  金:supported in part by the National Natural Science Foundation of China(62006141);the National Key R&D Program of China(2021YFC3300203);the Overseas Innovation Team Project of the“20 Regulations for New Universities”Funding Program of Jinan(2021GXRC073);the Excellent Youth Scholars Program of Shandong Province(2022HWYQ-048).

摘  要:The prevalence of long-tailed distributions in real-world data often results in classification models favoring the dominant classes,neglecting the less frequent ones.Current approaches address the issues in long-tailed image classification by rebalancing data,optimizing weights,and augmenting information.However,these methods often struggle to balance the performance between dominant and minority classes because of inadequate representation learning of the latter.To address these problems,we introduce descriptional words into images as cross-modal privileged information and propose a cross-modal enhanced method for long-tailed image classification,referred to as CMLTNet.CMLTNet improves the learning of intraclass similarity of tail-class representations by cross-modal alignment and captures the difference between the head and tail classes in semantic space by cross-modal inference.After fusing the above information,CMLTNet achieved an overall performance that was better than those of benchmark long-tailed and cross-modal learning methods on the long-tailed cross-modal datasets,NUS-WIDE and VireoFood-172.The effectiveness of the proposed modules was further studied through ablation experiments.In a case study of feature distribution,the proposed model was better in learning representations of tail classes,and in the experiments on model attention,CMLTNet has the potential to help learn some rare concepts in the tail class through mapping to the semantic space.

关 键 词:long-tailed classification cross-modal learning representation learning privileged infor-mation 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象