基于双重限制Q学习的机器人控制方法被引量：1

Robot Control Method Based on Double Limit Q Learning

作　　者：周维庆王飞[3] 赵德京 ZHOU Weiqing;WANG Fei;ZHAO Dejing(College of Automation,Qingdao University,Qingdao 266071,China;Shandong Key Laboratory of Industrial Control Technology,Qingdao 266071,China;Shandong Weifang Tobacco Co.,Ltd.,Weifang 262400,China)

机构地区：[1]青岛大学自动化学院,青岛266071 [2]山东省工业控制技术重点实验室,青岛266071 [3]山东潍坊烟草有限公司,潍坊262400

出　　处：《自动化与仪表》2024年第3期61-65,共5页Automation & Instrumentation

基　　金：国家自然科学基金项目(61903209)。

摘　　要：离线强化学习凭借不需要智能体与环境交互即可训练出令人满意效果的优势,在近期得到了非常迅速的发展。为了缓解外推误差和离线强化学习算法过于保守的问题,文中提出了基于双重限制Q学习的离线强化学习算法DIQL,限制Q值网络对数据分布外(out-of-distribution,OOD)动作估计值不应与经数据增强后的状态V估计值差距过大,限制策略产生的OOD动作距离数据集分布的均方差不应过大,在双重限制的前提下鼓励算法探索,当数据集质量较差的情况下仍能取得较好的效果。为了验证算法的有效性,特在双足六自由度机器人步态控制环境中进行实验,结果表明DIQL算法可以有效的处理OOD动作,缓解了外推误差和算法过于保守的问题。Offline reinforcement learning,which has the advantage of training satisfactory results without the interaction between the agent and the environment,has been developing rapidly recently.To alleviate the problem of extrinsic error and too conservative offline reinforcement learning algorithm,this paper proposes an offline reinforcement learning algorithm DIQL based on double restricted Q learning.The estimated value of the out-of-distribution(OOD)action of the restricted Q value network should not be too far from the estimated value of state V after data enhancement,and the limiting strategy should be adopted the mean square error of the generated OOD motion distance data set distribution should not be too large,and the algorithm exploration should be encouraged under the premise of double restrictions so that better results can be achieved even when the quality of the data set is poor.To verify the effectiveness of the algorithm,experiments are carried out in the gait control environment of a bipedal 6-DOF robot.The results show that the DIQL algorithm can effectively handle OOD actions and alleviate the problems of extrapolation error and over-conservative algorithm.

关键词：离线强化学习 OOD Q学习外推误差双足机器人

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于双重限制Q学习的机器人控制方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于双重限制Q学习的机器人控制方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于双重限制Q学习的机器人控制方法被引量：1