检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Naijun LIU Fuchun SUN Bin FANG Huaping LIU
机构地区:[1]Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China
出 处:《Science China(Information Sciences)》2024年第8期202-216,共15页中国科学(信息科学)(英文版)
基 金:supported by“New Generation Artificial Intelligence”Key Field Research and Development Plan of Guangdong Province(Grant No.2021B0101410002);National Science and Technology Major Project of the Ministry of Science and Technology of China(Grant No.2018AAA0102900);National Natural Science Foundation of China(Grant Nos.U22A2057,62133013).
摘 要:Skill learning through reinforcement learning has significantly progressed in recent years.How-ever,it often struggles to efficiently find optimal or near-optimal policies due to the inherent trial-and-error exploration in reinforcement learning.Although algorithms have been proposed to enhance skill learning efficacy,there is still much room for improvement in terms of skill learning performance and training sta-bility.In this paper,we propose an algorithm called skill enhancement learning with knowledge distillation(SELKD),which integrates multiple actors and multiple critics for skill learning.SELKD employs knowledge distillation to establish a mutual learning mechanism among actors.To mitigate critic overestimation bias,we introduce a novel target value calculation method.We also perform theoretical analysis to ensure the convergence of SELKD.Finally,experiments are conducted on several continuous control tasks,illustrating the effectiveness of the proposed algorithm.
关 键 词:skill learning enhancement learning reinforcement learning knowledge distillation
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49