依托多风格强化学习的车辆轨迹跟踪避撞控制  

Vehicle Trajectory Tracking and Collision Avoidance Control Based on Multi-style Reinforcement Learning

在线阅读下载全文

作  者:肖礼明 张发旺 陈良发 闫昊琪 马飞[1] 李升波[3] 段京良 Xiao Liming;Zhang Fawang;Chen Liangfa;Yan Haoqi;Ma Fei;Li Shengbo Eben;Duan Jingliang(School of Mechanical Engineering,University of Science and Technology Beijing,Beijing 100083;School of Mechanical Engineering,Beijing Institute of Technology,Beijing 100081;School of Vehicle and Mobility,Tsinghua University,Beijing 100084)

机构地区:[1]北京科技大学机械工程学院,北京100083 [2]北京理工大学机械与车辆学院,北京100081 [3]清华大学车辆与运载学院,北京100084

出  处:《汽车工程》2024年第6期945-955,共11页Automotive Engineering

基  金:国家自然科学基金(52202487、62273256);中央高校基本科研业务费专项资金项目(FRF-OT-23-02)资助。

摘  要:轨迹跟踪避撞是车辆智能性的重要体现,针对现有控制方法面对同一场景的控制风格单一问题,本文中提出了一种多风格型强化学习控制方法。为实现控制风格多样性,首次将风格指标引入值网络和策略网络,搭建了多风格跟踪避撞策略网络,并结合值分布强化学习理论构建了多风格策略迭代框架,依托该框架推导提出了多风格值分布强化学习算法。仿真和实车试验表明:所提出方法可以多种驾驶风格(激进、中性、保守)完成轨迹跟踪避撞任务,实车稳态轨迹跟踪误差小于5cm,具备较高的控制精度,实车平均单步决策耗时仅为6.07ms,满足实时性要求。Trajectory tracking and collision avoidance are key functions of vehicle intelligence.For the singular control style limitation of existing control methods in the same scene,a novel multi-style reinforcement learning(RL)method is proposed in this paper.To achieve diversity in control styles,style indicators are innovatively incorporated into value and policy networks to establish a multi-style tracking and collision avoidance policy network.Alongside this,a multi-style policy iteration framework is developed combining the distributional RL theory.Based on the framework,a multi-style distributional soft actor-critic algorithm(M-DSAC)is put forward.Through simulation and real vehicle tests,it is validated that the proposed method is capable of executing trajectory tracking and collision avoidance tasks across various driving styles,such as aggressive,neutral,and conservative,with the real vehicle’s steady-state trajectory tracking error less than 5 cm,with high control accuracy.The average single-step decision-making time for the real vehicle is merely 6.07 ms,meeting real-time requirements.

关 键 词:多风格 值分布强化学习 轨迹跟踪 主动避撞 

分 类 号:U463.6[机械工程—车辆工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象