检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨璐[1,2] 王一权 刘佳琦[1,2] 段玉林 张荣辉 YANG Lu;WANG Yiquan;LIU Jiaqi;DUAN Yulin;ZHANG Ronghui(Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control,School of Mechanical Engineering,Tianjin 300384,China;National Demonstration Center for Experimental Mechanical and Electrical Engineering Education,Tianjin University of Technology,Tianjin 300384,China;Institute of Agricultural Resources and Regional Planning,Chinese Academy of Agricultural Sciences,Beijing 100081,China;Guangdong Provincial Key Laboratory of Intelligent Transport System,Sun Yat-sen University,Guangzhou 510275,China)
机构地区:[1]天津理工大学天津市先进机电系统设计与智能控制重点实验室,天津300384 [2]天津理工大学机电工程国家级实验教学示范中心,天津300384 [3]中国农业科学院农业资源与农业区划研究所,北京100081 [4]中山大学广东省智能交通系统重点实验室,广州510275
出 处:《交通信息与安全》2022年第1期144-152,共9页Journal of Transport Information and Safety
基 金:中国农业科学院国际农业科学计划项目(CAAS-ZDRW202107);国家自然科学基金项目(52172350、51775565);天津市研究生科研创新项目(2020YJSZXS05)资助。
摘 要:针对基于强化学习的车辆驾驶行为决策方法存在的学习效率低、动作变化不平滑等问题,研究了1种融合不同动作空间网络的端到端自动驾驶决策方法,即融合离散动作的双延迟深度确定性策略梯度算法(TD3WD)。在基础双延迟深度确定性策略梯度算法(TD3)的网络模型中加入1个输出离散动作的附加Q网络辅助进行网络探索训练,将TD3网络与附加Q网络的输出动作进行加权融合,利用融合后动作与环境进行交互,对环境进行充分探索,以提高对环境的探索效率;更新Critic网络时,将附加网络输出作为噪声融合到目标动作中,鼓励智能体探索环境,使动作值预估更加准确;利用预训练的网络获取图像特征信息代替图像作为状态输入,降低训练过程中的计算成本。利用Carla仿真平台模拟自动驾驶场景对所提方法进行验证,结果表明:在训练场景中,所提方法的学习效率更高,比TD3和深度确定性策略梯度算法(DDPG)等基础算法收敛速度提升约30%;在测试场景中,所提出的算法的收敛后性能更好,平均压线率和转向盘转角变化分别降低74.4%和56.4%。There are issues for the decision support method for automated driving based on reinforcement learning,such as low learning efficiency and non-continuous actions. Therefore,an end-to-end decision-making method for autonomous driving is developed based on the Twin Delayed Deep Deterministic Policy Gradient with Discrete(TD3WD) algorithm,which can be used to fuse the information from different action spaces over a network. In the network of traditional Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm,an additional Q network that outputs discrete actions is added to assist exploration training. Weighted fusion of the output actions of TD3 network and additional Q network is performed. The fused actions interact with the environment,in order to fully explore the environment and enhance the efficiency of the environment exploration. When the Critic network is updated,the output of the attached network is merged into the target actions as noise to encourage the agent to explore the environment and obtain better action estimates. Instead of the original images,image feature obtained from the pre-trained network is used as the state input to reduce the computational cost in the training process. The proposed model is tested under a set of simulated autonomous driving scenarios generated by Carla simulation platform. The results show that the convergence speed of the proposed method is about 30% higher than that of traditional reinforcement learning algorithms like TD3 and Deep Deterministic Policy Gradient(DDPG)under the training scenarios. Under the testing scenarios,the proposed method shows better convergent performances and the average rate of lane-crossing and the change rate of steering angle are reduced by 74.4% and 56.4% respectively.
分 类 号:U463.6[机械工程—车辆工程] TP181[交通运输工程—载运工具运用工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.225