检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王文龙 张帆 WANG Wenlong;ZHANG Fan(School of Mechanical and Automotive Engineering,Shanghai University of Engineering Science,Shanghai 200335,China)
机构地区:[1]上海工程技术大学机械与汽车工程学院,上海市200335
出 处:《农业装备与车辆工程》2023年第9期46-51,共6页Agricultural Equipment & Vehicle Engineering
基 金:上海市科委生物医药领域科技支撑计划资助(17441901200)。
摘 要:为解决传统机械臂控制方法编码复杂、适应环境能力较差等问题,利用深度强化学习主动探索未知环境的特点对机械臂运动控制进行研究。致力于提高机械臂对环境的适应能力,降低环境对机械臂控制的干扰,采用分布式策略梯度算法,并重置奖励函数,与深度确定性策略梯度算法进行对比试验,极大地减少了算法训练时间,提高了机械臂在仿真环境中所能达到的最大奖励值,使末端执行机构快速、准确地到达目标位置。In order to solve the problems of complex coding and poor adaptability of traditional manipulator control methods,the characteristics of deep reinforcement learning active exploration of unknown environment were used to study the manipulator motion control.Committed to improving the adaptability of the manipulator to the environment and reducing the interference of the environment on the control of the manipulator,the distributed strategy gradient algorithm was adopted,the reward function was reset,and a comparative test with the deep deterministic strategy gradient algorithm was conducted,which greatly reduced the algorithm training time,improved the maximum reward value that the manipulator could achieve in the simulation environment,and enabled the end actuator to reach the target position quickly and accurately.
关 键 词:机械臂运动控制 深度强化学习 分布式策略梯度算法 重置奖励函数
分 类 号:TP242[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.172.251