基于碰撞预测的强化模仿学习机器人导航方法  被引量:2

Reinforcement Imitation Learning Method Based on Collision Prediction for Robots Navigation

在线阅读下载全文

作  者:王浩杰 陶冶[1] 鲁超峰 WANG Haojie;TAO Ye;LU Chaofeng(School of Information Science and Technology,Qingdao University of Science and Technology,Qingdao,Shandong 266100,China)

机构地区:[1]青岛科技大学信息科学技术学院,山东青岛266100

出  处:《计算机工程与应用》2024年第10期341-352,共12页Computer Engineering and Applications

基  金:国家重点研发计划(2018YFB1702902);山东省高等学校青创科技支持计划(2019KJN047)。

摘  要:基于学习的机器人导航方法存在对数据的依赖性高和在一些特定环境下表现不完美的问题,例如在空旷场景下无法走直线,在障碍物密集场景下碰撞率高。为了提高机器人的导航性能,提出了一种基于碰撞预测的强化模仿学习导航方法。在无模型的情况下,根据机器人的性能,建立马尔科夫决策过程(Markov decision process,MDP)中所需要的状态空间、动作空间、奖励函数。采用深度强化学习(deep reinforcement learning,DRL)在仿真环境中进行训练,使机器人获得能够在多障碍环境中导航和避障的能力。使用收集到的专家数据按照模仿学习方法对策略继续进行训练,改善强化学习在障碍物稀疏和密集两种极端情况下表现不完美的问题。设计了一个碰撞预测模型,将传统控制与深度学习相结合,根据预测结果,使机器人自适应地在不同环境下选取合适的控制策略,大大提高了导航的安全性。通过实验,在大量从未遇到过的场景下验证了所提出方法的导航性能和泛化能力。The learning-based robot navigation methods have high dependence on the dataset and imperfect performance in some specific environments,for example,agents cannot run towards its goal through a wide-open space and have high collision rate in space with dense obstacles.In order to improve the navigation performance of robots in multi-obstacle scenarios,a reinforcement imitation learning navigation method based on collision prediction is proposed.Firstly,the state space,action space,and reward function are built for the Markov decision process(MDP)based on the performance of the robot without model.The model is trained in simulation environment based on reinforcement learning to allow the robot to acquire navigation and obstacle avoidance abilities in sparse obstacle environments.To improve the shortcomings of reinforcement learning in terms of imperfect performance in specific environments,imitation learning is used to train the policy.Finally,a collision prediction model is designed to combine traditional control with deep learning to make the robot adaptively select the appropriate control policy in different environments based on the prediction results,which greatly improves the safety of navigation.The navigation performance and generalization capability of the proposed method are experimentally verified in a large number of never-before-encountered scenarios.

关 键 词:导航 强化学习 模仿学习 碰撞预测 混合控制 

分 类 号:TP242[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象