基于Safe-PPO算法的安全优先路径规划方法  被引量:1

Safety priority path planning method based on Safe-PPO algorithm

在线阅读下载全文

作  者:别桐 朱晓庆 付煜[3] 李晓理 阮晓钢 王全民 BIE Tong;ZHU Xiaoqing;FU Yu;LI Xiaoli;RUAN Xiaogang;WANG Quanmin(School of Artificial Intelligence and Automation,Faulty of Information Technology,Beijing University of Technology,Beijing 100124,China;Beijing Key Laboratory of Computational Intelligence and Intelligent System,Beijing University of Technology,Beijing 100124,China;School of Computer Science,Faulty of Information Technology,Beijing University of Technology,Beijing 100124,China)

机构地区:[1]北京工业大学信息学部人工智能与自动化学院,北京100124 [2]北京工业大学计算智能与智能系统北京市重点实验室,北京100124 [3]北京工业大学信息学部计算机学院,北京100124

出  处:《北京航空航天大学学报》2023年第8期2108-2118,共11页Journal of Beijing University of Aeronautics and Astronautics

基  金:国家自然科学基金(61773027,62103009);北京市自然科学基金(4202005)。

摘  要:现有的路径规划算法对路径规划过程中的路径安全性问题考虑较少,并且传统的近端策略优化(PPO)算法存在一定的方差适应性问题。为解决这些问题,提出一种融合进化策略思想和安全奖励函数的安全近端策略优化(Safe-PPO)算法,所提算法以安全优先进行路径规划。采用协方差自适应调整的进化策略(CMA-ES)的思想对PPO算法进行改进,并引入危险系数与动作因子来评估路径的安全性。使用二维栅格地图进行仿真实验,采用传统的PPO算法和Safe-PPO算法进行对比;采用六足机器人在搭建的场景中进行实物实验。仿真实验结果表明:所提算法在安全优先导向的路径规划方面具有合理性与可行性:在训练时Safe-PPO算法相比传统的PPO算法收敛速度提升了18%,获得的奖励提升了5.3%;在测试时采用融合危险系数与动作因子的方案能使机器人学会选择更加安全的道路而非直观上最快速的道路。实物实验结果表明:机器人可以在现实环境中选择更加安全的路径到达目标点。The existing path planning algorithms seldom consider the problem of security,and the traditional proximal policy optimization(PPO)algorithm has a variance adaptability problem.To solve these problems,the Safe-PPO algorithm combining evolutionary strategy and safety reward function was proposed.The algorithm is safety-oriented for path planning.CMA-ES was used to improve the PPO algorithm.The hazard coefficient and movement coefficient were introduced to evaluate the safety of the path.Used a grid map for simulation experiments,and compared the traditional PPO algorithm with the Safe-PPO algorithm;The hexapod robot was used to carry out the physical experiment in the constructed scene.The simulation results show that the Safe-PPO algorithm is reasonable and feasible in safety-oriented path planning.When compared to the conventional PPO algorithm,the Safe-PPO algorithm increased the rate of convergence during training by 18%and the incentive received by 5.3%.Using the algorithm that combined the Hazard coefficient and movement coefficient during testing enabled the robot to learn to choose the safer path rather than the fastest one.The outcomes of the physical testing demonstrated that the robot could select a more secure route to the objective in the created setting.

关 键 词:机器人导航 路径规划 深度强化学习 近端策略优化 安全路径选择 

分 类 号:TP242.6[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象