检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王凤英[1,2] 陈莹 袁帅 杜利明[1,2] WANG Fengying;CHEN Ying;YUAN Shuai;DU Liming(School of Computer Science and Engineering,Shenyang Jianzhu University,Shenyang 110168,China;School of Information Engineering,Suqian University,Suqian,Jiangsu 223800,China)
机构地区:[1]沈阳建筑大学计算机科学与工程学院,沈阳110168 [2]宿迁学院信息工程学院,江苏宿迁223800
出 处:《计算机工程与应用》2024年第19期158-166,共9页Computer Engineering and Applications
基 金:辽宁省应用基础研究计划(2023JH2/101300212);宿迁学院人才引进科研启动基金(校2022XRC091)。
摘 要:为更好解决深度确定性策略梯度算法在路径规划中存在样本利用率低、奖励稀疏、网络模型稳定速度慢等问题,提出了一种改进DDPG的算法。通过对机器人相机传感器获取图片信息加入自注意力机制,利用Dotproduct方法计算图片之间的相关性,能够将较高权重精确聚焦在障碍物信息中。在复杂环境中,由于机器人缺乏经验导致难以获得正反馈的奖励,影响了机器人的探索能力。将DDPG算法与HER结合,提出DDPG-HER算法,有效利用正负反馈使机器人从成功和失败的经历中均可学习到适当奖励。通过Gazebo搭建静态和动态仿真环境进行训练和测试,实验结果表明所提出的算法能显著提高样本利用率,加快网络模型稳定的速度,解决奖励稀疏的问题,使机器人在环境未知的路径规划中能够高效地避开障碍物到达目标点。In order to better solve the problems of low sample utilization,sparse reward and slow stability of network model in path planning of depth deterministic strategy gradient algorithm,an improved DDPG algorithm is proposed.By incorporating a self-attention mechanism into the image information obtained from robot camera sensors and using the Dot-product method to calculate the correlation between images,high weights can be accurately focused on obstacle information.In complex environments,it is difficult for robots to obtain positive feedback rewards due to their lack of experience,which affects their exploration ability.Combining DDPG algorithm with HER,a DDPG-HER algorithm is proposed,which effectively utilizes positive and negative feedback to enable robots to learn appropriate rewards from both successful and failed experiences.A static and dynamic simulation environment is built by Gazebo for training and testing.The experimental results show that the proposed algorithm can significantly improve the sample utilization rate,accelerate network model stability,and solve the problem of sparse reward,so that the robot can efficiently avoid obstacles and reach the target point in the path planning with unknown environment.
关 键 词:深度强化学习 深度确定性策略梯度算法(DDPG) 后见经验算法(HER) 自注意力机制 机器人路径规划
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117