基于强化学习的智能车辆路径跟踪变参数MPC多目标控制被引量：3

Variable-parameter MPC Multi-objective Control for Intelligent Vehicle Path Tracking Based on Reinforcement Learning

作　　者：汪洪波王春阳赵林峰胡延平 WANG Hong-bo;WANG Chun-yang;ZHAO Lin-feng;HU Yan-ping(School of Automotive and Transportation Engineering,Hefei University of Technology,Hefei 230009,Anhui,China)

机构地区：[1]合肥工业大学汽车与交通工程学院,安徽合肥230009

出　　处：《中国公路学报》2024年第3期157-169,共13页China Journal of Highway and Transport

基　　金：国家自然科学基金项目(U22A20246,52372382);合肥市自然科学基金项目(2022008)。

摘　　要：为了解决智能车辆在工况变化时跟踪精度下降和稳定性变差的问题,提出基于强化学习的变参数模型预测控制(MPC)算法多目标控制策略,实现智能车辆路径跟踪控制系统的参数自适应整定。基于车辆动力学模型设计其线性时变MPC控制器,获得最优前轮转向角和附加横摆力矩。基于Actor-Critic强化学习架构,设计进行控制参数整定的深度确定性策略梯度(DDPG)智能体和双延迟深度确定性策略梯度(TD3)智能体,构造以跟踪精度和稳定性为目标的收益函数,并搭建对接工况和变曲率工况2种典型仿真场景进行算法性能验证,当车辆处于对接工况时,根据路面附着系数的变化及时调整控制器的预测时域和权重矩阵;当车辆处于变曲率工况下时,针对道路曲率变化及时调整控制器的预测时域和权重矩阵。通过MATLAB/SimuLink、CarSim和Python联合仿真分析,将强化学习方法参数整定MPC与固定参数MPC和模糊控制方法参数整定MPC进行对比,结果表明:强化学习方法更能够在保证车辆安全性的前提下,尽可能提高智能车辆在不同路面条件下的路径跟踪精度。在对接工况下,强化学习方法参数整定MPC相较于固定参数MPC和模糊控制方法参数整定MPC,横向偏差平均值分别减少了99.8%和97.6%,前轮转角变化率平均值分别减小了99.7%和77.0%;变曲率工况下,横向偏差平均值分别减少了79.6%和90.8%,前轮转角变化率平均值分别减小了40.6%和2.6%。说明所提出的基于强化学习的智能车辆径跟踪变参数MPC多目标控制能够解决变工况下的路径跟踪的稳定性和跟踪精度控制问题,为复杂场景下的路径跟踪控制提供了一种思路。To address the problems of tracking accuracy degradation and stability deterioration when operating intelligent vehicles under changing driving conditions,a multi-objective control strategy based on reinforcement learning variable parameter model predictive control(MPC)algorithm was proposed in this study.The proposed method effectively realizes the parameter adaptive tuning of intelligent vehicle path tracking control system.The proposed linear time-varying MPC controller was designed based on a vehicle dynamics model to obtain the optimal front-wheel steering angle and additional yaw moment.Based on the Actor-Critic reinforcement learning architecture,the Deep Deterministic Policy Gradient(DDPG)and Twin Delayed Deep Deterministic Policy Gradient(TD3)agents were designed for control parameter tuning.The gain function was constructed with tracking accuracy and system stability as the goal,and two typical simulation scenarios of docking road and variable curvature road were constructed for the algorithm performance verification.For the docking road scenario,the prediction horizon and weight matrix of the controller were adjusted in time according to the changes in the road adhesion coefficient.Whereas for the variable curvature road scenario,the prediction horizon and weight matrix of the controller were adjusted in time according to the changes in the road curvature.Through joint simulation analyses conducted using MATLAB/SimuLink,CarSim,and Python,the reinforcement learning-tuned MPC was compared with fixed parameter MPC and Fuzzy-tuned MPC models.The results showed that the reinforcement learning methods yielded the best performance regarding the path tracking accuracy of intelligent vehicles under different road conditions,while guaranteeing the vehicle safety as much as possible.Under the docking road condition,compared with the fixed parameter MPC and Fuzzy-tuned MPC models,the average lateral deviation of the vehicle was reduced by 99.8%and 97.6%,respectively,when using the reinforcement learning-tuned MPC,

关键词：汽车工程路径跟踪模型预测控制强化学习控制参数整定附加横摆力矩控制

分类号：U469.72[机械工程—车辆工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的智能车辆路径跟踪变参数MPC多目标控制被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的智能车辆路径跟踪变参数MPC多目标控制 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于强化学习的智能车辆路径跟踪变参数MPC多目标控制被引量：3