基于多模态深度强化学习的端到端无人车运动规划

End-to-end Motion Planning of Unmanned Vehicles Based on Multimodal Deep Reinforcement Learning

作　　者：丁开源艾斯卡尔·艾木都拉[1,2] 朱斌伊克萨尼·普尔凯提马正堂 Ding Kaiyuan;Askar Hamdulla;Zhu Bin;Eksan Firkat;Ma Zhengtang(School of Computer Science and Technology,Xinjiang University,Urumqi 830017,China;Xinjiang Key Laboratory of SignalDetection and Processing,Urumqi 830017,China;Department of Automation,Tsinghua University,Beijing 100084,China)

机构地区：[1]新疆大学计算机科学与技术学院,新疆乌鲁木齐830017 [2]新疆信号检测与处理重点实验室,新疆乌鲁木齐830017 [3]清华大学自动化系,北京100084

出　　处：《系统仿真学报》2024年第11期2631-2643,共13页Journal of System Simulation

摘　　要：将强化学习应用到机器人的运动规划领域时,智能体无法感知周围环境且不能有效避开障碍物,从而无法推广到复杂、具有挑战性的地形。针对这些问题,提出使用基于多模态深度强化学习来解决无人车的运动规划任务,该方法学习如何结合本体感知状态和高维深度传感器输入。具体来说,本体感知状态提供用于即时反应的接触测量,并且无人车可以通过配备的视觉传感器学习并预测环境变化,提前多个时间步骤主动机动地应对障碍和不平坦地形的环境。提出了一种全新的端到端多模态Transformer融合模型,称为TransProAct(transformer-based proactive action),通过该模型的自我注意力机制融合本体感知状态和视觉信息,利用深度强化学习PPO算法训练无人车自我学习运动规划,引入多模态延迟随机化解决模拟和现实世界之间的差异。分别在不同障碍和不平坦地形的具有挑战性的仿真环境中进行评估,结果表明基于多模态深度强化学习的方法不仅显著改进了基线,在泛化性上也有很大的提高。Since the agent cannot sense the surrounding environment and cannot successfully avoid obstacles,reinforcement learning fails to be generalized to robot motion planning in difficult terrain.Therefore,a solution based on multimodal deep reinforcement learning,which learns to blend proprioceptive states with high-dimensional depth sensor inputs,is proposed for the motion planning of unmanned vehicles.To be specific,proprioceptive states offer contact measurement for immediate reaction,and the unmanned vehicle can learn and forecast environmental changes with its attached visual sensors,proactively navigating around obstacles and uneven terrains numerous time steps ahead.TransProAct(transformer-based proactive action),a unique end-to-end multimodal Transformer fusion model,is proposed.Proprioceptive states and visual data are fused through its self-attention mechanism,and then the deep reinforcement algorithm PPO is used to train the self-learning of motion planning by 11月TransProAct(transformer-based proactive action),a unique end-to-end multimodal Transformer fusion model,is proposed.Proprioceptive states and visual data are fused through its self-attention mechanism,and then the deep reinforcement algorithm PPO is used to train the self-learning of motion planning by the unmanned vehicle.In addition,multimodal delay randomization is introduced to resolve the differences between simulation and reality.After being tested in difficult simulation environments with a variety of barriers and uneven ground,the proposed approach shows notable gains over the baseline and a remarkable improvement in generalization ability.

关键词：多模态感知强化学习无人车运动规划神经网络

分类号：TP242.6[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多模态深度强化学习的端到端无人车运动规划

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多模态深度强化学习的端到端无人车运动规划

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索