融合KCCA推断强化学习的机器人智能轨迹规划  被引量:3

Intelligent trajectory planning based on reinforcement learning with KCCA inference for robot

在线阅读下载全文

作  者:傅剑[1] 滕翔 曹策 娄平[2] FU Jiana;TENG Xiang;CAO Cea;LOU Ping(School of Automation,Wuhan University of Technology,Wuhan 430070,China;School of Information,Wuhan University of Technology,Wuhan 430070,China)

机构地区:[1]武汉理工大学自动化学院,湖北武汉430070 [2]武汉理工大学信息学院,湖北武汉430070

出  处:《华中科技大学学报(自然科学版)》2019年第11期96-102,共7页Journal of Huazhong University of Science and Technology(Natural Science Edition)

基  金:国家自然科学基金资助项目(61773299,51575412);武汉理工大学优秀硕士论文培育项目(2017-YS-066)

摘  要:针对当前模仿强化学习(LfDRL)框架面向新任务时并未考虑机器人各关节之间的联系,从而影响学习效果的不足,利用伪协方差矩阵的思想,基于再生核空间(RKHS)和广义瑞丽熵构建面向泛函指标的关节间摄动相关局部坐标系,进而设计出一种集成核典型相关分析(KCCA)与路径积分策略提升(PI^2)的强化学习方法.利用学习经验数据基于KCCA推断出机器人各关节间面向轨迹规划任务的隐含非线性启发式信息,引导PI^2搜索到最优/次优策略,使得机器人实现从示范轨迹规划任务到新轨迹规划任务的快速迁移学习,并高质量完成.选择顺应性装配机械手臂(SCARA)和优傲5(UR5)机器人的过单点、过两点迁移学习智能轨迹规划实验,结果表明:融合KCCA推断启发式信息的强化学习的平均代价下降率明显优于经典的PI^2算法,其机器人智能轨迹规划在提升学习收敛速度的同时也提高了机器人完成新任务的精度.The idea of pseudo-covariance matrix was utilized to overcome deficient learning rate,caused by the fact that current learning from demonstration and reinforcement learning(LfDRL) framework would not take the correlation between the robotic joints into the consideration for the new task.The functional indicator-oriented local coordinate for inter-joint related perturbation was constructed based on the reproducing kernel Hilbert space(RKHS) and the generalized Rayleigh entropy.Furthermore,a reinforcement learning method composed of kernel canonical correlation analysis(KCCA) and path integration policy improvement(PI^2) was designed.KCCA was applied and learning experience data was used to infer the implicit non-linear heuristic information between each joints of the robot to guide PI^2 to search the optimal/sub-optimal strategy for the new task of trajectory planning.In this way,the robot could quickly realize the transfer learning from the demonstration trajectory to a new one with high quality.Classical via-points(one point and two points) trajectory programming experiment in the transfer learning was conducted with selective compliance assembly robot arm(SCARA) and universal robot 5(UR5).Results shows that the average cost reduction rate of reinforcement learning incorporating KCCA inferred heuristic information is obviously better than that of the classic PI^2 algorithm,and its intelligent trajectory planning for robot can increase the learning convergence speed while improving the accuracy of new task completed by robot.

关 键 词:轨迹规划 模仿强化学习(LfDRL) 核典型相关分析(KCCA) 路径积分策略提升(PI^2) 伪协方差矩阵 

分 类 号:TP242.6[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象