基于改进Q学习的机械臂实时障碍规避方法  

Real-time Obstacle Avoidance of Robotic Manipulator Based on Improved Q-learning

在线阅读下载全文

作  者:吴戴燕 刘世林[2] Wu Daiyan;Liu Shilin(Department of Mechanical and Electrical Engineering,Anhui Lu'an Technician College,Lu'an 237001,China;School Of Electrical Engineering,Anhui Polytechnic University,Wuhu 241000,China)

机构地区:[1]安徽六安技师学院机电工程系,安徽六安237001 [2]安徽工程大学电子工程学院,安徽芜湖241000

出  处:《台州学院学报》2022年第6期13-20,共8页Journal of Taizhou University

基  金:安徽省高校自然科学研究重大项目(KJ2018ZD066);安徽省高校自然科学研究重点项目(KJ2019A1184)。

摘  要:为了提高实时机械臂规避障碍物的适应性,提出一种基于改进Q学习的控制规避方法。首先,利用深度增强学习对机械臂动作给予奖励和惩罚,并通过深度神经网络学习特征表示。然后,采用状态和动作集合以及环境迁移概率矩阵定义马尔科夫决策过程;同时,将归一化优势函数与Q学习算法相结合,以支持在连续空间中定义的机器人系统。实验结果表明:所提方法解决了Q学习收敛速度慢的缺点,实现了高性能机械臂的实时避障,有助于实现人机安全共存。To improve the adaptability of real-time manipulator to avoid obstacles,a control avoidance method based on improved Q-learning is proposed.Firstly,deep reinforcement learning is used to reward and punish the manip‐ulator action,and the feature representation is learned by deep neural network.Then,the Markov decision process is defined by state and action sets and environment migration probability matrix.At the same time,the normalized domi‐nance function is combined with Q-learning algorithm to support the robot system defined in continuous space.The ex‐perimental results show that the proposed method solves the disadvantage of slow convergence speed of Q-learning,re‐alizes real-time obstacle avoidance of high-performance manipulator,and is conducive to the safe coexistence of man and machine.

关 键 词:机械臂 马尔科夫决策 深度增强学习 Q学习 归一化优势函数 

分 类 号:TP241[自动化与计算机技术—检测技术与自动化装置] TP18[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象