基于主动风险防御机制的多机器人强化学习协同对抗策略被引量：2

Cooperative countermeasure strategy based on active risk defense multi-agent reinforcement learning

作　　者：孙辉辉胡春鹤[1,3] 张军国 SUN Hui-hui;HU Chun-he;ZHANG Jun-guo(School of Technology,Beijing Forestry University,Beijing 100083,China;School of Mechanical and Electrical Engineering,North China Institute of Science and Technology,Langfang 065201,China;Key Lab of State Forestry and Grassland Administration for Forestry Equipment and Automation,Beijing 100083,China)

机构地区：[1]北京林业大学工学院,北京100083 [2]华北科技学院机电工程学院,河北廊坊065201 [3]国家林业和草原局林业装备与自动化重点实验室,北京100083

出　　处：《控制与决策》2023年第5期1420-1429,共10页Control and Decision

基　　金：国家自然科学基金项目(61703047);河北省高等学校科学技术研究项目(QN2021312)。

摘　　要：深度强化学习因其在多机器人系统中的高效表现,已经成为多机器人领域的研究热点.然而,当遭遇连续时变、风险未知的非结构场景时,传统方法暴露出风险防御能力差、系统安全性能脆弱的问题,未知风险将以对抗攻击的形式给多机器人的状态空间带来非线性入侵.针对这一问题,提出一种基于主动风险防御机制的多机器人强化学习方法(APMARL).首先,基于局部可观察马尔可夫博弈模型,建立多机记忆池共享的风险判别机制,通过构建风险状态指数提前预测当前行为的安全性,并根据风险预测结果自适应执行与之匹配的风险处理模式;特别地,针对有风险侵入的非安全状态,提出基于增强型注意力机制的Actor-Critic主动防御网络架构,实现对重点信息的分级增强和危险信息的有效防御.最后,通过广泛的多机协作对抗任务实验表明,具有主动风险防御机制的强化学习策略可以有效降低敌对信息的入侵风险,提高多机器人协同对抗任务的执行效率,增强策略的稳定性和安全性.Deep reinforcement learning(DRL)has become a hotspot in the field of multi-robot systems due to its efficient performance.However,when encountering unstructured environment with time-varying and unknown risks,the traditional DRL methods exposes the disadvantage of poor risk defense ability and fragile system security.The unknown risk will bring nonlinear intrusion to the state space of multi-robot systems in the form of anti attack,which will pose a serious threat to the estimation of robot motion strategy.To solve this problem,this paper proposes a multi-agent reinforcement learning method based on active risk defense mechanism(ARD-MARL).Firstly,based on the locally observable Markov game model,a risk discrimination mechanism with global communication information is established to predict the current behavior state.Secondly,in the strategy deployment stage,we build an event-triggered multi risk processing scheme to implement the matching security strategy for different levels of risk prediction.Then,aiming at the dangerous state with risk intrusion,an active defense Actor-Critic network architecture based on the enhanced attention mechanism is designed.Through magnifying the important information and restraining the threat information,a safer and more efficient motion strategy is generated.Finally,extensive experiments are carried out in multi-agent cooperative and confrontation tasks.The results show that the multi-robot reinforcement learning method with active security defense mechanism can effectively enhance the stability and anti risk ability,and improve the security of information transmissions.

关键词：深度强化学习多机器人风险防御协同对抗事件驱动

分类号：TP24[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于主动风险防御机制的多机器人强化学习协同对抗策略被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于主动风险防御机制的多机器人强化学习协同对抗策略 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于主动风险防御机制的多机器人强化学习协同对抗策略被引量：2