基于强化学习的多无人飞行器避碰决策方法被引量：3

A Multi-UAV Collision Avoidance Decision-MakingMethod Based on Reinforcement Learning

作　　者：杨艳飞诸燕平[2] 胡灿张斌 YANG Yanfei;ZHU Yanping;HU Can;ZHANG Bin(Changzhou University,School of Computer Science and Artificial Intelligence,Changzhou 213000,China;Changzhou University,School of Microelectronics and Control Engineering,Changzhou 213000,China)

机构地区：[1]常州大学计算机与人工智能学院,江苏常州213000 [2]常州大学微电子与控制工程学院,江苏常州213000

出　　处：《电光与控制》2023年第9期112-118,共7页Electronics Optics & Control

基　　金：江苏省研究生科研创新项目(KYCX22_3053)。

摘　　要：随着低空空域环境的日益复杂,执行任务的无人飞行器间发生冲突的概率不断增加。针对传统强化学习算法SAC,DDPG在解决有限空域内多无人飞行器间的避碰问题上存在收敛速度慢、收敛不稳定等缺陷,提出了一种基于PPO2算法的多智能体强化学习(MARL)方法。首先,将多无人飞行器飞行决策问题描述为马尔可夫决策过程;其次,设计状态空间与奖励函数,通过最大化累计奖赏来优化策略,使整体训练更加稳定、收敛更快;最后,基于深度学习TensorFlow框架和强化学习Gym环境搭建飞行模拟场景,进行仿真实验。实验结果表明,所提方法相较于基于SAC和DDPG算法的方法,避碰成功率分别提高约37.74和49.15个百分点,能够更好地解决多无人飞行器间的避碰问题,在收敛速度和收敛稳定性方面更优。With the increasingly complex low-altitude airspace environment,the probability of conflict among UAVs performing missions is increasing.Traditional reinforcement learning algorithms of SAC and DDPG suffer from slow convergence rate and unstable convergence in solving the problem of collision avoidance among multiple UAVs in limited airspace.To solve the problems,a Multi-Agent Reinforcement Learning(MARL)method based on PPO2 algorithm is proposed.Firstly,the multi-UAV flight decision-making problem is described as a Markov decision-making process.Secondly,the state space and reward function are designed to optimize the strategy by maximizing the cumulative reward,so that the overall training is more stable and converges faster.Finally,a flight simulation scene is built based on the deep learning framework TensorFlow and the reinforcement learning environment Gym,and simulation experiments are carried out.The experimental results show that the proposed method improves the success rate of collision avoidance by about 37.74 and 49.15 percent points respectively compared with that of the SAC and DDPG algorithms,which can better solve the problem of collision avoidance among multiple UAVs,and is better in terms of convergence rate and convergence stability.

关键词：无人飞行器深度强化学习(DRL) 多智能体避碰 PPO2

分类号：V249[航空宇航科学与技术—飞行器设计]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的多无人飞行器避碰决策方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的多无人飞行器避碰决策方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于强化学习的多无人飞行器避碰决策方法被引量：3