检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张兴龙 陆阳 李文璋 徐昕[1] ZHANG Xing-Long;LU Yang;LI Wen-Zhang;XU Xin(College of Intelligence Science and Technology,National University of Defense Technology,Changsha 410073)
出 处:《自动化学报》2023年第12期2481-2492,共12页Acta Automatica Sinica
基 金:国家重点研究发展计划(2018YFB1305105);国家自然科学基金(62003361,61825305)资助。
摘 要:针对智能车辆的高精度侧向控制问题,提出一种基于滚动时域强化学习(Receding horizon reinforcement learning,RHRL)的侧向控制方法.车辆的侧向控制量由前馈和反馈两部分构成,前馈控制量由参考路径的曲率以及动力学模型直接计算得出;而反馈控制量通过采用滚动时域强化学习算法求解最优跟踪控制问题得到.提出的方法结合滚动时域优化机制,将无限时域最优控制问题转化为若干有限时域控制问题进行求解.与已有的有限时域执行器-评价器学习不同,在每个预测时域采用时间独立型执行器-评价器网络结构学习最优值函数和控制策略.与模型预测控制(Model predictive control,MPC)方法求解开环控制序列不同,RHRL控制器的输出是一个显式状态反馈控制律,兼具直接离线部署和在线学习部署的能力.此外,从理论上证明了RHRL算法在每个预测时域的收敛性,并分析了闭环系统的稳定性.在仿真环境中完成了结构化道路下的车辆侧向控制测试.仿真结果表明,提出的RHRL方法在控制性能方面优于现有先进算法,最后,以红旗E-HS3电动汽车作为实车平台,在封闭结构化城市测试道路和乡村起伏砂石道路下进行了侧向控制实验.实验结果显示,RHRL在结构化城市道路中的侧向控制性能优于预瞄控制,在乡村道路中具有较强的路面适应能力和较好的控制性能.This paper presents a receding horizon reinforcement learning(RHRL)algorithm for realizing high-accuracy lateral control of intelligent vehicles.The overall lateral control is composed of a feedforward control term that is directly computed using the curvature of the reference path and the dynamic model,and a feedback control term that is generated by solving an optimal control problem using the proposed RHRL algorithm.The proposed RHRL adopts a receding horizon optimization mechanism,and decomposes the infinite-horizon optimal control problem into several finite-horizon ones to be solved.Different from existing finite-horizon actor-critic learning algorithms,in each prediction horizon of RHRL,a time-independent actor-critic structure is utilized to learn the optimal value function and control policy.Also,compared with model predictive control(MPC),the control learned by RHRL is an explicit state-feedback control policy,which can be deployed directly offline or learned and deployed synchronously online.Moreover,the convergence of the proposed RHRL algorithm in each prediction horizon is proven and the stability analysis of the closed-loop system is peroformed.Simulation studies on a structural road show that,the proposed RHRL algorithm performs better than current state-of-the-art methods.The experimental studies on an intelligent driving platform built with a Hongqi E-HS3 electric car show that RHRL performs better than the pure pursuit method in the adopted structural city road scenario,and exhibits strong adaptability to road conditions and satisfactory control performance in the country road scenario.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222