深度强化学习TD3算法在倒立摆系统中的应用被引量：3

Research on Application of Deep Reinforcement Learning TD3 Algorithm in Inverted Pendulum System

作　　者：何卫东[1] 刘小臣张迎辉[1] 姚世选 HE Weidong;LIU Xiaochen;ZHANG Yinghui;YAO Shixuan(School of Mechanical Engineering,Dalian Jiaotong University,Dalian 116028,China;College of Software,Dalian Foreign Language University,Dalian 116044,China)

机构地区：[1]大连交通大学机械工程学院,辽宁大连116028 [2]大连外国语大学软件学院,辽宁大连116044

出　　处：《大连交通大学学报》2023年第1期38-44,共7页Journal of Dalian Jiaotong University

摘　　要：针对现有控制算法在倒立摆系统控制中存在的局限性,融合强化学习和深度学习方法,提出一种基于双延迟深度确定性策略梯度(TD3)的倒立摆端到端控制方法。首先,利用倒立摆动力学模型搭建虚拟仿真环境,设计稀疏奖励函数;其次,通过深度神经网络构建从倒立摆状态输入到执行动作输出的端到端控制模型,分析倒立摆特性,来确定神经网络结构和参数;最后,将虚拟仿真环境中生成的模型移植到倒立摆实物平台并进行优化。试验结果表明:该方法生成的模型能够有效地建立倒立摆状态和执行动作之间的映射关系,在运动控制中具有一定的借鉴意义。Aiming at the limitations of existing control algorithms in the control of inverted pendulum systems, an end-to-end control method for inverted pendulums based on the dual-delay depth deterministic strategy gradient(TD3) is proposed combining reinforcement learning and deep learning. First, the inverted pendulum dynamic model is used to build a virtual simulation environment, and a sparse reward function is designed. Then, a deep neural network is used to build an end-to-end control model from the inverted pendulum state input to the execution action output, the characteristics of the inverted pendulum are analyzed, and the neural network structure and parameters are determined. Finally, the model generated in the virtual simulation environment is transplanted to the inverted pendulum physical platform for optimization. Experiment results show that the model generated by this method can effectively establish the mapping relationship between the state of the inverted pendulum and the execution of the action, which has certain reference significance in motion control.

关键词：深度强化学习倒立摆控制 TD3 端到端稀疏奖励函数

分类号：TP18[自动化与计算机技术—控制理论与控制工程] O314[自动化与计算机技术—控制科学与工程] TP273[理学—一般力学与力学基础]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度强化学习TD3算法在倒立摆系统中的应用被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

深度强化学习TD3算法在倒立摆系统中的应用 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

深度强化学习TD3算法在倒立摆系统中的应用被引量：3