动态环境下共融机器人深度强化学习导航算法  

Deep Reinforcement Learning Navigation Algorithm for Coexisting-Cooperative-Cognitive Robots in Dynamic Environment

在线阅读下载全文

作  者:顾金浩 况立群[1,2,3] 韩慧妍 曹亚明 焦世超 GU Jinhao;KUANG Liqun;HAN Huiyan;CAO Yaming;JIAO Shichao(School of Computer Science and Technology,North University of China,Taiyuan 030051,China;Shanxi Provincial Key Laboratory of Machine Vision and Virtual Reality,Taiyuan 030051,China;Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center,Taiyuan 030051,China)

机构地区:[1]中北大学计算机科学与技术学院,太原030051 [2]机器视觉与虚拟现实山西省重点实验室,太原030051 [3]山西省视觉信息处理及智能机器人工程研究中心,太原030051

出  处:《计算机工程与应用》2025年第4期90-98,共9页Computer Engineering and Applications

基  金:国家自然科学基金(62272426);山西省科技重大专项计划“揭榜挂帅”项目(202201150401021);山西省自然科学基金(202303021211153,202203021222027,202303021212189,202203021212138);山西省科技成果转化引导专项(202104021301055)。

摘  要:在过去的几十年里,移动服务机器人的导航算法得到了广泛研究,但智能体仍然缺乏人类在拥挤环境中展现出的复杂性和合作性。随着人机共融的应用不断拓展,机器人和人类共享工作空间的协作将愈发重要,因此下一代移动服务机器人需要符合社交要求,才能被人类接受。为了提升多智能体在动态场景中的自主导航能力,针对多智能体导航中社会适应性低和寻找最优值函数问题,提出了一种动态环境下共融机器人深度强化学习避障算法。建立了更贴近人类行为的运动模型并将其添加到深度强化学习框架中,用于提高共融机器人的合作性;为了在行人物理安全的基础上提升其感知安全,重新制定了奖励函数;利用非线性深度神经网络代替传统的值函数,解决寻找最优值函数问题。仿真实验显示,相较于最新的深度强化学习导航方法,该方法在不增加导航时间的情况下实现了100%的导航成功率,且没有发生任何碰撞。结果表明,该方法使共融机器人最大限度地满足人类的社交原则,同时朝着目标前进,有效提高了行人的感知安全。In the past few decades,navigation algorithms for mobile service robots have been extensively studied,but intelli-gent agents still lack the complexity and cooperation exhibited by humans in crowded environments.With the continuous expansion of human-machine integration applications,collaboration between robots and humans in shared workspaces will become increasingly important.Therefore,the next generation of mobile service robots needs to meet social require-ments in order to be accepted by humans.In order to enhance the autonomous navigation ability of multi-agent systems in dynamic scenarios,a deep reinforcement learning obstacle avoidance algorithm for coexisting-cooperative-cognitive robots in dynamic environments is proposed to address the issues of low social adaptability and finding the optimal value func-tion in multi-agent navigation.A motion model that is more closely related to human behavior is established and added to the deep reinforcement learning framework to improve the cooperation of coexisting-cooperative-cognitive robots.In order to enhance the perceived safety of pedestrians based on their physical safety,a reward function is redefined.Non-linear deep neural networks are used instead of traditional value functions to solve the problem of finding the optimal value function.Simulation experiments show that compared to the latest deep reinforcement learning navigation methods,the proposed method achieves a 100%navigation success rate without increasing navigation time and without any collisions.The results indicate that this method maximizes the satisfaction of human social principles for the fusion robot,while effectively moving towards the goal and improving the perceived safety of pedestrians.

关 键 词:服务机器人 避障算法 深度强化学习 最优值函数 奖励函数 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TP242[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象