基于MCPDDPG的智能车辆路径规划方法及应用  被引量:13

The method and application of intelligent vehicle path planning based on MCPDDPG

在线阅读下载全文

作  者:余伶俐[1] 魏亚东 霍淑欣 YU Ling-li;WEI Ya-dong;HUO Shu-xin(College of Automation,Central South University,Changsha 410083,China)

机构地区:[1]中南大学自动化学院,长沙410083

出  处:《控制与决策》2021年第4期835-846,共12页Control and Decision

基  金:国家重点研发计划项目(2018YFB1201602);国家自然科学基金项目(61976224);湖南省科技重大专项(2017GK1010)。

摘  要:针对智能车路径规划过程中常存在动态环境感知预估不足的问题,使用基于蒙特卡罗深度策略梯度学习(Monte Carlo prediction deep deterministic policy gradient, MCPDDPG)的智能车辆路径规划方法,设计一种基于环境感知预测、行为决策和控制序列生成的框架,实现实时的决策和规划,并输出连续的车辆控制序列.首先,利用序贯蒙特卡罗预估他车行为状态量;然后,设计基于强化Q学习的行为决策方法,使智能车辆实时预知碰撞风险,采取合理的规避策略;最后,构建深度策略梯度学习网络框架,获取智能车辆规划路径的最优轨迹序列.实验结果表明,所提方法能够缓解环境感知的预估不足问题,提升智能车辆行为决策的快速性,保障路径规划的主动安全,并输出连续的轨迹序列,为智能车辆导航控制提供前提.Aiming at the problem of insufficient dynamic environment perception and estimation in the process of intelligent vehicle path planning, we design a frame based on environment perception prediction、behavior decision and control sequence generation with an intelligent vehicle path planning method based on MCPDDPG(Monte Carlo prediction deep deterministic policy cradient). The framework can realize a real-time decision-making and planning for intelligent vehicle, and output continuous vehicle control sequences. Firstly, we use sequential Monte Carlo to estimate the behavioral state of other cars;Then, we design a behavioral decision method based on reinforcement Q learning to enable intelligent vehicles to predict collision risks in real time and adopt reasonable avoidance strategies;Finally, we build a deep deterministic policy gradient learning network to obtain the optimal trajectory sequence of the intelligent vehicle planning path. Experimental results show that the proposed method can alleviate the problem of insufficient prediction of environmental perception, improve the speed of intelligent vehicle behavior decision-making, ensure the active safety of path planning, and output a continuous trajectory sequence, which provides a prerequisite for intelligent vehicle navigation control.

关 键 词:路径规划 蒙特卡罗预测 智能车辆 深度策略梯度 强化学习 决策 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象