基于策略—估值—好奇心框架强化学习的机器人轨迹规划  

ROBOTIC TRAJECTORY PLANNING WITH REINFORCEMENT LEARING METHODBASED ON ACTION-CRITIC-CURIOSITY FRAMEWORK

作  者:贾路宽 谢劼欣 岳校田 邓飞 朱德良 郭士杰 Jia Lukuan;Xie Jiexin;Yue Xiaotian;Deng fei;Zhu Deliang;Guo Shijie(College of Mechanical Engineering,Hebei University of Technology,Tianjin 300130,China;Academy for Engineering&Technology,Fudan University,Shanghai 200433,China;International College,Zhengzhou University,Zhengzhou 450001,Henan,China)

机构地区:[1]河北工业大学机械工程学院,天津300130 [2]复旦大学工程与应用技术研究院,上海200433 [3]郑州大学国际学院,河南郑州450001

出  处:《计算机应用与软件》2025年第3期268-273,共6页Computer Applications and Software

基  金:上海市科技重大专项(2021SHZDZX0103)。

摘  要:现有基于深度强化学习(DRL)的机器人轨迹规划方法通常效率低下,且容易陷入局部最优解。为解决上述问题,设计了一种好奇心网络,并基于此提出了策略—估值—好奇心框架(A-C-C),A-C-C使智能体以更接近人类的方式处理问题,更关注探索的过程而不是结果。通过加强对未知区域的探索,A-C-C框架能够有效地提高DRL方法的学习效率并避免局部最优解。实验结果表明,A-C-C框架可以与不同的奖励函数结合,使得探索效率加快43.6%~101.2%,同时可以使得收敛均值提高4.8%~6.4%。In robot trajectory planning,deep reinforcement learning(DRL)based methods often suffer from the low learning efficiency and the problem of locally optimal solution.To cope with the defects above,a curiosity network and a modified optimization framework action-critic-curiosity(A-C-C)are proposed.A-C-C enabled the agent considering the problems more human-like,and made it pay more attentions to the process of exploration than the result.By promoting the exploration of unknown regions,A-C-C effectively improved the learning efficiency of DLR method and avoided local optimal solutions.The experiment results show that the proposed method can be combined with different reward functions to accelerate exploration efficiency by 43.6%-101.2%.The mean convergence is also improved by 4.8%-6.4%.

关 键 词:深度强化学习 机器人轨迹规划 优化框架 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象