检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Yilin LYU Liping JING Jiaqi WANG Mingzhe GUO Xinyue WANG Jian YU
机构地区:[1]School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China [2]Beijing Key Lab of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,China [3]Alibaba Group,Beijing 100102,China
出 处:《Science China(Information Sciences)》2023年第3期184-199,共16页中国科学(信息科学)(英文版)
基 金:supported by National Key Research and Development Program of China (Grant No. 2020AAA0106800);Beijing Natural Science Foundation (Grant Nos. Z180006, L211016);National Natural Science Foundation of China (Grant No. 62176020);CAAI-Huawei Mind Spore Open Fund;Chinese Academy of Sciences (Grant No. OEIP-O-202004)
摘 要:Distinguishing the subtle differences among fine-grained images from subordinate concepts of a concept hierarchy is a challenging task.In this paper,we propose a Siamese transformer with hierarchical concept embedding(STrHCE),which contains two transformer subnetworks sharing all configurations,and each subnetwork is equipped with the hierarchical semantic information at different concept levels for fine-grained image embeddings.In particular,one subnetwork is for coarse-scale patches to learn the discriminative regions with the aid of the innate multi-head self-attention mechanism of the transformer.The other subnetwork is for finer-scale patches,which are adaptively sampled from the discriminative regions,to capture subtle yet discriminative visual cues and eliminate redundant information.STrHCE connects the two subnetworks through a score margin adjustor to enforce the most discriminative regions generating more confident predictions.Extensive experiments conducted on four commonly-used benchmark datasets,including CUB-200-2011,FGVC-Aircraft,Stanford Dogs,and NABirds,empirically demonstrate the superiority of the proposed STrHCE over state-of-the-art baselines.
关 键 词:fine-grained image recognition TRANSFORMER hierarchical concept embedding adaptive sampling Siamese network
分 类 号:TN957.52[电子电信—信号与信息处理]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222