检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:熊李艳[1] 舒垚淞 曾辉[1] 黄晓辉[1] Xiong Liyan;Shu Yaosong;Zeng Hui;Huang Xiaohui(School of Information Engineering,East China Jiaotong University,Nanchang 330013,China)
机构地区:[1]华东交通大学信息工程学院,江西南昌330013
出 处:《华东交通大学学报》2023年第1期67-74,共8页Journal of East China Jiaotong University
基 金:国家自然科学基金项目(62067002,61967006,62062033);江西省自然科学基金项目(20212BAB202008);江西省交通厅科技项目(2022X0040)。
摘 要:移动机器人穿越动态密集人群时,由于对环境信息理解不充分,导致机器人导航效率低且泛化能力弱。针对这一问题,提出了一种双重注意深度强化学习算法。首先,对稀疏的奖励函数进行优化,引入距离惩罚项和舒适性距离,保证机器人趋近目标的同时兼顾导航的安全性;其次,设计了一种基于双重注意力的状态价值网络处理环境信息,保证机器人导航系统兼具环境理解能力与实时决策能力;最后,在仿真环境中对算法进行验证。实验结果表明,提出的算法不仅提高了机器人导航效率还提升了导航系统的鲁棒性,主要表现为:在500个随机的测试场景中,碰撞次数和超时次数均为0,导航成功率优于对比算法,且平均导航时间比最好的算法缩短了2%;当环境中行人数量、导航距离发生变化时算法依然有效,且导航时间短于对比算法。When the mobile robot passes through the dynamic dense crowd,due to the insufficient understanding of environmental information,the robot navigation efficiency is low and the generalization ability is weak.To solve this problem,a double-attention deep reinforcement learning algorithm is proposed.Firstly,the sparse reward function was optimized,and the distance penalty term and comfort distance were introduced to ensure that the robot approached the target while taking into account the safety of navigation.Secondly,a state value network based on double attention was designed to process environmental information to ensure that the robot navigation system has both environmental understanding ability and real-time decision-making ability.Finally,the algorithm was verified in the simulation environment.Experimental results show that the proposed algorithm not only improves the navigation efficiency,but also improves the robustness of the robot navigation system;The main performance is that in 500 random test scenarios,the collision times and timeout times are 0,the navigation success rate is better than the comparison algorithm,and the average navigation time is 2%shorter than the best algorithm;When the number of pedestrians and navigation distance in the environment change,the algorithm is still effective,and the navigation time is shorter than the comparison algorithm.
关 键 词:深度强化学习 奖励函数 状态价值网络 双重注意力
分 类 号:U495[交通运输工程—交通运输规划与管理] TP242[交通运输工程—道路与铁道工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.103.74