强化学习方法在通信拒止战场仿真环境中多无人机目标搜寻问题上的适用性研究被引量：10

Feasibility of reinforcement learning for UAV-based target searching in a simulated communication denied environment

作　　者：汪亮[1] 王文[1] 王禹又侯松林乔裕哲吴天珩陶先平[1] Liang WANG;Wen WANG;Yuyou WANG;Songlin HOU;Yuzhe QIAO;Tianheng WU;Xianping TAO(State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China)

机构地区：[1]南京大学计算机软件新技术国家重点实验室,南京210023

出　　处：《中国科学：信息科学》2020年第3期375-395,共21页Scientia Sinica(Informationis)

基　　金：2018年度科技创新2030—“新一代人工智能”重大项目(批准号:2018AAA0102302);南京大学软件新技术与产业化协同创新中心资助项目。

摘　　要：目标搜索问题是现实中一类常见的问题,如灾难现场搜救、战场目标侦察等.无人机由于其灵活性、低成本、可搭载各类传感器并以集群形式开展协作等优势,是解决大范围、高风险区域目标搜索问题的理想技术方案,当前发展迅速.在战场等复杂现实环境中,由于缺乏基础通信设施及干扰的存在,无人机与地面指挥员、无人机之间难以快速、可靠通信,处于通信拒止状态.因此,无人机难以获得指挥员的实时控制信息,需要其具备自主、智能完成任务的能力并开展协同.随着人工智能技术的快速发展,强化学习技术在解决连续决策问题上展现出了较强的潜力.无人机搜索问题作为一种典型的连续决策问题,属于强化学习技术的适用范围.但对于目前的强化学习及人工智能技术能否适用于无人机从而自主决策完成现实场景中的任务这一问题尚存争议,仍有待进一步探索.为此,本文以现实战场环境为背景,对通信拒止及包含两方对抗的战场环境中的目标搜寻问题进行了建模,依据模型构建了对抗仿真平台,并通过实验研究的方式针对以下3个问题展开了探索:(1)强化学习在通信拒止环境下多无人机搜索问题的适用性;(2)各强化学习算法在该问题上的优劣;(3)通信拒止程度对强化学习算法效果的影响.通过运用当前主流的强化学习技术开展仿真实验并定量评估实验结果.本文总结发现:(1)强化学习在解决通信拒止环境下多无人机搜索问题上具备有效性;(2)在与其他算法对抗时,运用基于Deep Q-Network (DQN)强化学习技术的自主决策无人机集群体现出了较强的问题解决能力;(3)通信拒止程度对强化学习算法效果有影响,但在不同的通信拒止程度下,强化学习算法表现相对稳定.Target searching is crucial in real-world scenarios such as search and rescue in disaster sites and battlefield target reconnaissance. Unmanned aerial vehicles(UAVs) are an ideal technical solution for target searching in large-scale and high-risk areas because they are agile, low cost, and able to collaborate and carry different sensors. In complex scenarios like battlefields, due to the lack of communication infrastructures and the intensive interference, UAVs often operate in communication denied environments. As a result, fast and reliable communication channels between UAVs and ground operators are difficult to establish. Thus, in such conditions, UAVs must be able to complete tasks autonomously and intelligently, without receiving real-time commands from the operators. With the rapid advances in artificial intelligence, reinforcement learning has shown potentiality for solving continuous decision problems. The target searching problem studied in this paper falls into this category and is suitable for adopting reinforcement learning technologies. However, the feasibility of reinforcement learning in UAV-based target searching in communication denied environments is not clear and,thus, requires in-depth investigations. As a pilot study in this direction, this paper models the target searching problem in communication denied and confrontation situations and proposes a simulation environment based on this model. Extensive experiments are conducted to answer the following questions.(1) Can reinforcement learning be applied in target searching by multi-UAVs in communication denied environments?(2) What are the advantages and disadvantages of different reinforcement learning algorithms in solving this problem?(3) How the degree of communication denial influences the performance of these algorithms? The current mainstream reinforcement learning technologies are adopted to perform simulations, whose results are analyzed quantitatively,leading to the following observations.(1) Reinforcement learning can effectively s

关键词：无人机强化学习目标搜寻通信拒止环境

分类号：V279[航空宇航科学与技术—飞行器设计] TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

强化学习方法在通信拒止战场仿真环境中多无人机目标搜寻问题上的适用性研究被引量：10

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

强化学习方法在通信拒止战场仿真环境中多无人机目标搜寻问题上的适用性研究 被引量：10

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

强化学习方法在通信拒止战场仿真环境中多无人机目标搜寻问题上的适用性研究被引量：10