检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈良发 宋绪杰 肖礼明 高路路[1] 张发旺 李升波[2] 马飞[1] 段京良 CHEN Liangfa;SONG Xujie;XIAO Liming;GAO Lulu;ZHANG Fawang;LI Shengbo;MA Fei;DUAN Jingliang(School of Mechanical Engineering,University of Science and Technology Beijing,Beijing 100083,China;School of Vehicle and Mobility,Tsinghua University,Beijing 100084,China;School of Mechanical Engineering,Beijing Institute of Technology,Beijing 100081,China)
机构地区:[1]北京科技大学机械工程学院,北京100083 [2]清华大学车辆与运载学院,北京100084 [3]北京理工大学机械与车辆学院,北京100081
出 处:《哈尔滨工业大学学报》2024年第12期116-123,共8页Journal of Harbin Institute of Technology
基 金:国家自然科学基金(52202487);汽车安全与节能国家重点实验室开放基金(KF2212)。
摘 要:为解决现有铰接车轨迹跟踪控制面临的动作波动问题,提高铰接车轨迹跟踪控制的精度以及平滑性,提出了一种考虑轨迹预瞄的平滑强化学习型跟踪控制方法。首先,为保证控制精度,通过将参考轨迹信息作为预瞄信息引入强化学习策略网络和值网络,构建了预瞄型强化学习迭代框架。然后,为保证控制平滑性,引入LipsNet网络结构近似策略函数,从而实现策略网络Lipschitz常数的自适应限制。最后,结合值分布强化学习理论,建立了最终的平滑强化学习型轨迹跟踪控制方法,实现了铰接车轨迹跟踪的控制精度和控制平滑性的协同优化。仿真结果表明,本研究提出的平滑强化学习跟踪控制方法(SDSAC)在6种不同噪声等级下均能保持良好的动作平滑能力,且具备较高跟踪精度;与传统值分布强化学习(DSAC)相比,在高噪声条件下,SDSAC动作平滑度提升超过5.8倍。此外,与模型预测控制相比,SDSAC的平均单步求解速度提升约60倍,具有较高的在线计算效率。This research tackles the challenge of action fluctuation in articulated vehicle trajectory tracking control,aiming to enhance both accuracy and smoothness.It introduces a novel approach:a smooth tracking control methodology grounded in reinforcement learning(RL).Firstly,to improve the control accuracy,we incorporate trajectory preview information as input to both the policy and value networks and establish a predictive policy iteration framework.Then,to ensure control smoothness,we employ the LipsNet network to approximate the policy function,to realize the adaptive restriction of the Lipschitz constant of the policy network.Finally,coupled with distributional RL theory,we formulate an articulated vehicle trajectory tracking control method,named smooth distributional soft actor-critic(SDSAC),focusing on achieving synergistic optimization of both control precision and action smoothness.The simulation results demonstrate that the proposed method can maintain good action smoothing ability under six different noise levels,and has strong noise robustness and high tracking accuracy.Compared with traditional value distribution reinforcement learning distributional soft actor-critic(DSAC),SDSAC improves action smoothness by more than 5.8 times under high noise conditions.In addition,compared with model predictive control,SDSAC's average single-step solution speed is improved by about 60 times,and it has higher online computing efficiency.
分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.128.190.205