检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:丁开源 艾斯卡尔·艾木都拉[1,2] 朱斌 伊克萨尼·普尔凯提 马正堂 Ding Kaiyuan;Askar Hamdulla;Zhu Bin;Eksan Firkat;Ma Zhengtang(School of Computer Science and Technology,Xinjiang University,Urumqi 830017,China;Xinjiang Key Laboratory of SignalDetection and Processing,Urumqi 830017,China;Department of Automation,Tsinghua University,Beijing 100084,China)
机构地区:[1]新疆大学计算机科学与技术学院,新疆乌鲁木齐830017 [2]新疆信号检测与处理重点实验室,新疆乌鲁木齐830017 [3]清华大学自动化系,北京100084
出 处:《系统仿真学报》2024年第11期2631-2643,共13页Journal of System Simulation
摘 要:将强化学习应用到机器人的运动规划领域时,智能体无法感知周围环境且不能有效避开障碍物,从而无法推广到复杂、具有挑战性的地形。针对这些问题,提出使用基于多模态深度强化学习来解决无人车的运动规划任务,该方法学习如何结合本体感知状态和高维深度传感器输入。具体来说,本体感知状态提供用于即时反应的接触测量,并且无人车可以通过配备的视觉传感器学习并预测环境变化,提前多个时间步骤主动机动地应对障碍和不平坦地形的环境。提出了一种全新的端到端多模态Transformer融合模型,称为TransProAct(transformer-based proactive action),通过该模型的自我注意力机制融合本体感知状态和视觉信息,利用深度强化学习PPO算法训练无人车自我学习运动规划,引入多模态延迟随机化解决模拟和现实世界之间的差异。分别在不同障碍和不平坦地形的具有挑战性的仿真环境中进行评估,结果表明基于多模态深度强化学习的方法不仅显著改进了基线,在泛化性上也有很大的提高。Since the agent cannot sense the surrounding environment and cannot successfully avoid obstacles,reinforcement learning fails to be generalized to robot motion planning in difficult terrain.Therefore,a solution based on multimodal deep reinforcement learning,which learns to blend proprioceptive states with high-dimensional depth sensor inputs,is proposed for the motion planning of unmanned vehicles.To be specific,proprioceptive states offer contact measurement for immediate reaction,and the unmanned vehicle can learn and forecast environmental changes with its attached visual sensors,proactively navigating around obstacles and uneven terrains numerous time steps ahead.TransProAct(transformer-based proactive action),a unique end-to-end multimodal Transformer fusion model,is proposed.Proprioceptive states and visual data are fused through its self-attention mechanism,and then the deep reinforcement algorithm PPO is used to train the self-learning of motion planning by 11月TransProAct(transformer-based proactive action),a unique end-to-end multimodal Transformer fusion model,is proposed.Proprioceptive states and visual data are fused through its self-attention mechanism,and then the deep reinforcement algorithm PPO is used to train the self-learning of motion planning by the unmanned vehicle.In addition,multimodal delay randomization is introduced to resolve the differences between simulation and reality.After being tested in difficult simulation environments with a variety of barriers and uneven ground,the proposed approach shows notable gains over the baseline and a remarkable improvement in generalization ability.
关 键 词:多模态感知 强化学习 无人车 运动规划 神经网络
分 类 号:TP242.6[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7