基于SAC深度强化学习算法的充电枪寻孔策略研究

Research on hole-finding strategy of charging gun based on SAC deep reinforcement learning algorithm

作　　者：徐建明[1] 陈阜董建伟 XU Jianming;CHEN Fu;DONG Jianwei(College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023)

机构地区：[1]浙江工业大学信息工程学院,杭州310023

出　　处：《高技术通讯》2023年第1期63-71,共9页Chinese High Technology Letters

基　　金：国家自然科学基金-浙江省自然科学基金联合基金两化融合项目(U1709213);国家自然科学基金面上项目(61374103)资助。

摘　　要：针对机器人自动化充电任务中的寻孔操作,研究基于柔性行动者评价者(SAC)深度强化学习算法的机器人寻孔策略。设计一个基于actor-critic框架、以枪头位姿、接触力信息为输入、末端枪头坐标系XY平面动作为输出的策略控制器。该策略控制器共有5个神经网络,分别为actor网络、2个目标critic网络、2个critic网络;actor网络负责输出寻孔动作,目标critic网络负责输出下一寻孔状态下寻孔动作的价值评估,critic网络负责输出当前寻孔状态下寻孔动作的价值评估。基于double-Q trick方法使用2个目标critic网络输出价值中的较小值和2个critic网络输出价值中的较小值来分别更新critic网络和actor网络,以训练策略控制器。采用力位混合控制结构,将actor网络输出的XY平面位移动作转换成期望平动速度,与Z轴力跟踪导纳控制输出的期望速度合成机器人期望速度引导充电枪寻孔。仿真和实验验证了所提方法的有效性。Aiming at the hole-finding operation in robot automatic charging task,the hole-finding strategy of robot based on soft actor-critic(SAC)deep reinforcement learning algorithm is studied.Based on actor-critic framework,the strategy takes the pose and contact force information of the gun head as input and the XY planes motion of the end-gun head coordinate system as output.The strategy controller has five neural networks,which are actor network,two target critic networks,and two critic networks.The actor network is responsible for outputting the searching ac-tion,the target critic network is responsible for outputting the value evaluation of the searching action at the next state,and the critic network is responsible for outputting the value evaluation of the searching action at the current state.Based on the double-Q trick method,the smaller value of the output values of the two target critic networks and the two critic networks are used to update the critic network and the actor network respectively,thereby training the strategy controller.Using the force and position hybrid control structure,the XY planes displacement motion output by the actor network is converted into the expected translation speed,which is combined with the expected speed output by the Z-axis force tracking admittance control to guide the charging gun to find holes.The effective-ness of the proposed method is verified by simulation and experiment.

关键词：机器人寻孔深度强化学习柔性行动者评价者(SAC)算法神经网络力控制

分类号：TP18[自动化与计算机技术—控制理论与控制工程] TP242[自动化与计算机技术—控制科学与工程] U491.8[交通运输工程—交通运输规划与管理]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于SAC深度强化学习算法的充电枪寻孔策略研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于SAC深度强化学习算法的充电枪寻孔策略研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索