基于深度强化学习的机器人导航算法研究  被引量:1

Research on Robot Navigation Algorithm Based on Deep Reinforcement Learning

在线阅读下载全文

作  者:熊李艳[1] 舒垚淞 曾辉[1] 黄晓辉[1] Xiong Liyan;Shu Yaosong;Zeng Hui;Huang Xiaohui(School of Information Engineering,East China Jiaotong University,Nanchang 330013,China)

机构地区:[1]华东交通大学信息工程学院,江西南昌330013

出  处:《华东交通大学学报》2023年第1期67-74,共8页Journal of East China Jiaotong University

基  金:国家自然科学基金项目(62067002,61967006,62062033);江西省自然科学基金项目(20212BAB202008);江西省交通厅科技项目(2022X0040)。

摘  要:移动机器人穿越动态密集人群时,由于对环境信息理解不充分,导致机器人导航效率低且泛化能力弱。针对这一问题,提出了一种双重注意深度强化学习算法。首先,对稀疏的奖励函数进行优化,引入距离惩罚项和舒适性距离,保证机器人趋近目标的同时兼顾导航的安全性;其次,设计了一种基于双重注意力的状态价值网络处理环境信息,保证机器人导航系统兼具环境理解能力与实时决策能力;最后,在仿真环境中对算法进行验证。实验结果表明,提出的算法不仅提高了机器人导航效率还提升了导航系统的鲁棒性,主要表现为:在500个随机的测试场景中,碰撞次数和超时次数均为0,导航成功率优于对比算法,且平均导航时间比最好的算法缩短了2%;当环境中行人数量、导航距离发生变化时算法依然有效,且导航时间短于对比算法。When the mobile robot passes through the dynamic dense crowd,due to the insufficient understanding of environmental information,the robot navigation efficiency is low and the generalization ability is weak.To solve this problem,a double-attention deep reinforcement learning algorithm is proposed.Firstly,the sparse reward function was optimized,and the distance penalty term and comfort distance were introduced to ensure that the robot approached the target while taking into account the safety of navigation.Secondly,a state value network based on double attention was designed to process environmental information to ensure that the robot navigation system has both environmental understanding ability and real-time decision-making ability.Finally,the algorithm was verified in the simulation environment.Experimental results show that the proposed algorithm not only improves the navigation efficiency,but also improves the robustness of the robot navigation system;The main performance is that in 500 random test scenarios,the collision times and timeout times are 0,the navigation success rate is better than the comparison algorithm,and the average navigation time is 2%shorter than the best algorithm;When the number of pedestrians and navigation distance in the environment change,the algorithm is still effective,and the navigation time is shorter than the comparison algorithm.

关 键 词:深度强化学习 奖励函数 状态价值网络 双重注意力 

分 类 号:U495[交通运输工程—交通运输规划与管理] TP242[交通运输工程—道路与铁道工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象