检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Dong LIANG Yue SUN Yun DU Songcan CHEN Sheng-Jun HUANG
机构地区:[1]MIIT Key Laboratory of Pattern Analysis and Machine Intelligence,College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China [2]Shenzhen Research Institute,Nanjing University of Aeronautics and Astronautics,Shenzhen 518000,China
出 处:《Science China(Information Sciences)》2024年第9期122-141,共20页中国科学(信息科学)(英文版)
基 金:partly supported by National Natural Science Foundation of China(Grant Nos.62272229,62076124,62222605);National Key R&D Program of China(Grant No.2020AAA0107000);Natural Science Foundation of Jiangsu Province(Grant Nos.BK20222012,BK20211517);Shenzhen Science and Technology Program(Grant No.JCYJ20230807142001004)。
摘 要:Current knowledge distillation(KD)methods primarily focus on transferring various structured knowledge and designing corresponding optimization goals to encourage the student network to imitate the output of the teacher network.However,introducing too many additional optimization objectives may lead to unstable training,such as gradient conflicts.Moreover,these methods ignored the guidelines of relative learning difficulty between the teacher and student networks.Inspired by human cognitive science,in this paper,we redefine knowledge from a new perspective—the student and teacher networks'relative difficulty of samples,and propose a pixel-level KD paradigm for semantic segmentation named relative difficulty distillation(RDD).We propose a two-stage RDD framework:teacher-full evaluated RDD(TFE-RDD)and teacher-student evaluated RDD(TSE-RDD).RDD allows the teacher network to provide effective guidance on learning focus without additional optimization goals,thus avoiding adjusting learning weights for multiple losses.Extensive experimental evaluations using a general distillation loss function on popular datasets such as Cityscapes,Cam Vid,Pascal VOC,and ADE20k demonstrate the effectiveness of RDD against state-ofthe-art KD methods.Additionally,our research showcases that RDD can integrate with existing KD methods to improve their upper performance bound.Codes are available at https://github.com/sunyueue/RDD.git.
关 键 词:knowledge distillation semantic segmentation relative difficulty sample weighting prediction discrepancy
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.143.7.5