检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Xiangxian Li Yuze Zheng Haokai Ma Zhuang Qi Xiangxu Meng Lei Meng
机构地区:[1]School of Software,Shandong University,Jinan 250101,China
出 处:《Computational Visual Media》2024年第5期981-992,共12页计算可视媒体(英文版)
基 金:supported in part by the National Natural Science Foundation of China(62006141);the National Key R&D Program of China(2021YFC3300203);the Overseas Innovation Team Project of the“20 Regulations for New Universities”Funding Program of Jinan(2021GXRC073);the Excellent Youth Scholars Program of Shandong Province(2022HWYQ-048).
摘 要:The prevalence of long-tailed distributions in real-world data often results in classification models favoring the dominant classes,neglecting the less frequent ones.Current approaches address the issues in long-tailed image classification by rebalancing data,optimizing weights,and augmenting information.However,these methods often struggle to balance the performance between dominant and minority classes because of inadequate representation learning of the latter.To address these problems,we introduce descriptional words into images as cross-modal privileged information and propose a cross-modal enhanced method for long-tailed image classification,referred to as CMLTNet.CMLTNet improves the learning of intraclass similarity of tail-class representations by cross-modal alignment and captures the difference between the head and tail classes in semantic space by cross-modal inference.After fusing the above information,CMLTNet achieved an overall performance that was better than those of benchmark long-tailed and cross-modal learning methods on the long-tailed cross-modal datasets,NUS-WIDE and VireoFood-172.The effectiveness of the proposed modules was further studied through ablation experiments.In a case study of feature distribution,the proposed model was better in learning representations of tail classes,and in the experiments on model attention,CMLTNet has the potential to help learn some rare concepts in the tail class through mapping to the semantic space.
关 键 词:long-tailed classification cross-modal learning representation learning privileged infor-mation
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.217.79.15