检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵天亮 张小俊[1] 张明路[1] 陈建文 ZHAO Tianliang;ZHANG Xiaojun;ZHANG Minglu;CHEN Jianwen(School of Mechanical Engineering,Hebei University of Technology,Tianjin 300401,China)
出 处:《河北工业大学学报》2024年第4期21-30,共10页Journal of Hebei University of Technology
基 金:天津市新一代人工智能科技重大专项资助项目(18ZXZNGX00230)。
摘 要:针对深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)算法在训练神经网络时出现收敛不稳定、学习效率低等问题,提出了一种基于奖励指导的深度确定性策略梯度(Reward Guidance DDPG,RG_DDPG)算法。该算法在回合内创建优秀经验集合,便于指导智能汽车充分利用过往有效信息,得到稳定的控制策略;采用基于奖励的优先经验回放机制,打破数据之间的关联性,提高数据的利用率,减少搜索过程的盲目性,提高算法的收敛稳定性。基于ROS(Robot Operating System)操作系统对算法进行了验证。在Gazebo建模软件中,设计了智能汽车模型以及障碍物环境,利用决策算法规划智能汽车的安全行驶路径。数据结果验证了RG_DDPG算法在处理路径规划任务的有效性,相比于DDPG算法,改进后智能汽车的车速能够提升60.5%,获取奖励提升一倍多,算法的收敛稳定性更好。最后通过实车实验验证了该算法的实用性。Aiming at solving the problems of unstable convergence and low learning efficiency of the Deep Determinis⁃tic Policy Gradient(DDPG)algorithm when training neural networks,a Reward Guidance DDPG(RG_DDPG)algorithm was proposed.The algorithm creates a set of excellent experience in the round,which is convenient to guide the intelli⁃gent car to make full use of the past effective information and obtain a stable control strategy.The reward-based priority experience playback mechanism is adopted to break the correlation between data,improve the utilization rate of data,re⁃duce the blindness of the search process,and improve the convergence stability of the algorithm.The algorithm is veri⁃fied based on Robot Operating System(ROS)operating system.In the Gazebo modeling software,the intelligent car mod⁃el and the obstacle environment are designed.Use decision-making algorithms to plan safe driving paths for intelligent cars.The data results verify the effectiveness of the RG_DDPG algorithm in handling path planning tasks.Compared with the DDPG algorithm,the speed of the improved intelligent car can be increased by 60.5%,the reward obtained is more than doubled,and the convergence stability of the algorithm is better.Finally,the feasibility of the algorithm is veri⁃fied by real vehicle experiments.
关 键 词:智能汽车 无人驾驶 路径规划 深度确定性策略梯度 奖励指导
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30