A policy iteration method for improving robot assembly trajectory efficiency 被引量：1

作　　者：Qi ZHANG Zongwu XIE Baoshi CAO Yang LIU

机构地区：[1]State Key Laboratory of Robotics and System,Harbin Institute of Technology,Harbin 150001,China

出　　处：《Chinese Journal of Aeronautics》2023年第3期436-448,共13页中国航空学报（英文版）

基　　金：supported by the National Natural Science Foundation of China(No.91848202);the Special Foundation(Pre-Station)of China Postdoctoral Science(No.2021TQ0089)。

摘　　要：Bolt assembly by robots is a vital and difficult task for replacing astronauts in extravehicular activities(EVA),but the trajectory efficiency still needs to be improved during the wrench insertion into hex hole of bolt.In this paper,a policy iteration method based on reinforcement learning(RL)is proposed,by which the problem of trajectory efficiency improvement is constructed as an issue of RL-based objective optimization.Firstly,the projection relation between raw data and state-action space is established,and then a policy iteration initialization method is designed based on the projection to provide the initialization policy for iteration.Policy iteration based on the protective policy is applied to continuously evaluating and optimizing the action-value function of all state-action pairs till the convergence is obtained.To verify the feasibility and effectiveness of the proposed method,a noncontact demonstration experiment with human supervision is performed.Experimental results show that the initialization policy and the generated policy can be obtained by the policy iteration method in a limited number of demonstrations.A comparison between the experiments with two different assembly tolerances shows that the convergent generated policy possesses higher trajectory efficiency than the conservative one.In addition,this method can ensure safety during the training process and improve utilization efficiency of demonstration data.

关键词：Bolt assembly Policy initialization Policy iteration Reinforcement learning(RL) Robotic assembly Trajectory efficiency

分类号：V46[航空宇航科学与技术—航空宇航制造工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

A policy iteration method for improving robot assembly trajectory efficiency 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

A policy iteration method for improving robot assembly trajectory efficiency 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索