检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐建明[1] 陈阜 董建伟 XU Jianming;CHEN Fu;DONG Jianwei(College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023)
出 处:《高技术通讯》2023年第1期63-71,共9页Chinese High Technology Letters
基 金:国家自然科学基金-浙江省自然科学基金联合基金两化融合项目(U1709213);国家自然科学基金面上项目(61374103)资助。
摘 要:针对机器人自动化充电任务中的寻孔操作,研究基于柔性行动者评价者(SAC)深度强化学习算法的机器人寻孔策略。设计一个基于actor-critic框架、以枪头位姿、接触力信息为输入、末端枪头坐标系XY平面动作为输出的策略控制器。该策略控制器共有5个神经网络,分别为actor网络、2个目标critic网络、2个critic网络;actor网络负责输出寻孔动作,目标critic网络负责输出下一寻孔状态下寻孔动作的价值评估,critic网络负责输出当前寻孔状态下寻孔动作的价值评估。基于double-Q trick方法使用2个目标critic网络输出价值中的较小值和2个critic网络输出价值中的较小值来分别更新critic网络和actor网络,以训练策略控制器。采用力位混合控制结构,将actor网络输出的XY平面位移动作转换成期望平动速度,与Z轴力跟踪导纳控制输出的期望速度合成机器人期望速度引导充电枪寻孔。仿真和实验验证了所提方法的有效性。Aiming at the hole-finding operation in robot automatic charging task,the hole-finding strategy of robot based on soft actor-critic(SAC)deep reinforcement learning algorithm is studied.Based on actor-critic framework,the strategy takes the pose and contact force information of the gun head as input and the XY planes motion of the end-gun head coordinate system as output.The strategy controller has five neural networks,which are actor network,two target critic networks,and two critic networks.The actor network is responsible for outputting the searching ac-tion,the target critic network is responsible for outputting the value evaluation of the searching action at the next state,and the critic network is responsible for outputting the value evaluation of the searching action at the current state.Based on the double-Q trick method,the smaller value of the output values of the two target critic networks and the two critic networks are used to update the critic network and the actor network respectively,thereby training the strategy controller.Using the force and position hybrid control structure,the XY planes displacement motion output by the actor network is converted into the expected translation speed,which is combined with the expected speed output by the Z-axis force tracking admittance control to guide the charging gun to find holes.The effective-ness of the proposed method is verified by simulation and experiment.
关 键 词:机器人寻孔 深度强化学习 柔性行动者评价者(SAC)算法 神经网络 力控制
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TP242[自动化与计算机技术—控制科学与工程] U491.8[交通运输工程—交通运输规划与管理]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249