自注意力机制结合DDPG的机器人路径规划研究被引量：1

Robot Path Planning Based on Self-Attention Mechanism Combined with DDPG

作　　者：王凤英[1,2] 陈莹袁帅杜利明[1,2] WANG Fengying;CHEN Ying;YUAN Shuai;DU Liming(School of Computer Science and Engineering,Shenyang Jianzhu University,Shenyang 110168,China;School of Information Engineering,Suqian University,Suqian,Jiangsu 223800,China)

机构地区：[1]沈阳建筑大学计算机科学与工程学院,沈阳110168 [2]宿迁学院信息工程学院,江苏宿迁223800

出　　处：《计算机工程与应用》2024年第19期158-166,共9页Computer Engineering and Applications

基　　金：辽宁省应用基础研究计划(2023JH2/101300212);宿迁学院人才引进科研启动基金(校2022XRC091)。

摘　　要：为更好解决深度确定性策略梯度算法在路径规划中存在样本利用率低、奖励稀疏、网络模型稳定速度慢等问题,提出了一种改进DDPG的算法。通过对机器人相机传感器获取图片信息加入自注意力机制,利用Dotproduct方法计算图片之间的相关性,能够将较高权重精确聚焦在障碍物信息中。在复杂环境中,由于机器人缺乏经验导致难以获得正反馈的奖励,影响了机器人的探索能力。将DDPG算法与HER结合,提出DDPG-HER算法,有效利用正负反馈使机器人从成功和失败的经历中均可学习到适当奖励。通过Gazebo搭建静态和动态仿真环境进行训练和测试,实验结果表明所提出的算法能显著提高样本利用率,加快网络模型稳定的速度,解决奖励稀疏的问题,使机器人在环境未知的路径规划中能够高效地避开障碍物到达目标点。In order to better solve the problems of low sample utilization,sparse reward and slow stability of network model in path planning of depth deterministic strategy gradient algorithm,an improved DDPG algorithm is proposed.By incorporating a self-attention mechanism into the image information obtained from robot camera sensors and using the Dot-product method to calculate the correlation between images,high weights can be accurately focused on obstacle information.In complex environments,it is difficult for robots to obtain positive feedback rewards due to their lack of experience,which affects their exploration ability.Combining DDPG algorithm with HER,a DDPG-HER algorithm is proposed,which effectively utilizes positive and negative feedback to enable robots to learn appropriate rewards from both successful and failed experiences.A static and dynamic simulation environment is built by Gazebo for training and testing.The experimental results show that the proposed algorithm can significantly improve the sample utilization rate,accelerate network model stability,and solve the problem of sparse reward,so that the robot can efficiently avoid obstacles and reach the target point in the path planning with unknown environment.

关键词：深度强化学习深度确定性策略梯度算法(DDPG) 后见经验算法(HER) 自注意力机制机器人路径规划

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

自注意力机制结合DDPG的机器人路径规划研究被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

自注意力机制结合DDPG的机器人路径规划研究 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

自注意力机制结合DDPG的机器人路径规划研究被引量：1