Demonstration-enhanced policy search for space multi-arm robot collaborative skill learning  

在线阅读下载全文

作  者:Tian GAO Chengfei YUE Xiaozhe JU Tao LIN 

机构地区:[1]Institute of Space Science and Applied Technology,Harbin Institute of Technology,Shenzhen 518055,China [2]Research Center of Satellite Technology,Harbin Institute of Technology,Harbin 150001,China

出  处:《Chinese Journal of Aeronautics》2025年第3期462-473,共12页中国航空学报(英文版)

基  金:co-supported by the National Natural Science Foundation of China(No.12372045);the Guangdong Basic and Applied Basic Research Foundation,China(No.2023B1515120018);the Shenzhen Science and Technology Program,China(No.JCYJ20220818102207015).

摘  要:The increasing complexity of on-orbit tasks imposes great demands on the flexible operation of space robotic arms, prompting the development of space robots from single-arm manipulation to multi-arm collaboration. In this paper, a combined approach of Learning from Demonstration (LfD) and Reinforcement Learning (RL) is proposed for space multi-arm collaborative skill learning. The combination effectively resolves the trade-off between learning efficiency and feasible solution in LfD, as well as the time-consuming pursuit of the optimal solution in RL. With the prior knowledge of LfD, space robotic arms can achieve efficient guided learning in high-dimensional state-action space. Specifically, an LfD approach with Probabilistic Movement Primitives (ProMP) is firstly utilized to encode and reproduce the demonstration actions, generating a distribution as the initialization of policy. Then in the RL stage, a Relative Entropy Policy Search (REPS) algorithm modified in continuous state-action space is employed for further policy improvement. More importantly, the learned behaviors can maintain and reflect the characteristics of demonstrations. In addition, a series of supplementary policy search mechanisms are designed to accelerate the exploration process. The effectiveness of the proposed method has been verified both theoretically and experimentally. Moreover, comparisons with state-of-the-art methods have confirmed the outperformance of the approach.

关 键 词:Space multi-arm collaboration Demonstrations .Reinforcement Learning Probabilistic Movement Primitives Relative Entropy Policy Search Policy search mechanism 

分 类 号:TP242[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象