基于改进PPO算法的双足机器人自适应行走控制  被引量:1

Adaptive walking control for bipedal robots based on enhanced PPO algorithm

在线阅读下载全文

作  者:吴万毅 刘芳华[1] 郭文龙 WU Wanyi;LIU Fanghua;GUO Wenlong(School of Mechanical Engineering,Jiangsu University of Science and Technology,Zhenjiang 212000,China)

机构地区:[1]江苏科技大学机械工程学院,江苏镇江212000

出  处:《扬州大学学报(自然科学版)》2023年第6期44-50,共7页Journal of Yangzhou University:Natural Science Edition

基  金:国家自然科学基金资助项目(62002141)。

摘  要:针对双足机器人在未知环境行走过程中步态不稳的问题,提出了一种基于近端策略优化(proximal policy optimization,PPO)的双足机器人控制方法.首先,构建动作网络和价值网络,引入长短时记忆(long short-term memory,LSTM),以缩小双足机器人与未知环境交互时的状态估计值与期望值之间的偏差;其次,在动作网络中引入注意力机制,自适应改变神经网络自主学习的权重系数,以提高学习效率,得到适应不同环境的稳定步态;最后,通过仿真实验验证所提算法的有效性.结果表明:改进后近端策略优化算法的收敛速度更快,学习效率更高,能够有效提高双足机器人自适应行走的稳定性.A control method for bipedal robots based on proximal policy optimization(PPO)is proposed to address the issue of unstable gait during walking in unknown environments.Firstly,the construct an action network and value network are constructed,and long short term memory(LSTM)is constructed to reduce the deviation between the estimated state and the expected value when the bipedal robot interacts with the unknown environment.Secondly,the attention mechanism is introduced into the action network to adaptively change the weight coefficients of the neural network for autonomous learning,in order to improve learning efficiency and obtain a stable adapted to different environments.Finally,the effectiveness of the proposed algorithm is verified by simulation experiments.The results show that the improved proximal strategy optimization algorithm has faster convergence speed,higher learning efficiency,and can effectively improve the stability of adaptive walking for bipedal robots.

关 键 词:近端策略优化算法 长短时记忆 注意力机制 双足行走机器人 神经网络 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程] TP242.6[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象