检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙辉辉 胡春鹤[1,3] 张军国 SUN Hui-hui;HU Chun-he;ZHANG Jun-guo(School of Technology,Beijing Forestry University,Beijing 100083,China;School of Mechanical and Electrical Engineering,North China Institute of Science and Technology,Langfang 065201,China;Key Lab of State Forestry and Grassland Administration for Forestry Equipment and Automation,Beijing 100083,China)
机构地区:[1]北京林业大学工学院,北京100083 [2]华北科技学院机电工程学院,河北廊坊065201 [3]国家林业和草原局林业装备与自动化重点实验室,北京100083
出 处:《控制与决策》2023年第5期1420-1429,共10页Control and Decision
基 金:国家自然科学基金项目(61703047);河北省高等学校科学技术研究项目(QN2021312)。
摘 要:深度强化学习因其在多机器人系统中的高效表现,已经成为多机器人领域的研究热点.然而,当遭遇连续时变、风险未知的非结构场景时,传统方法暴露出风险防御能力差、系统安全性能脆弱的问题,未知风险将以对抗攻击的形式给多机器人的状态空间带来非线性入侵.针对这一问题,提出一种基于主动风险防御机制的多机器人强化学习方法(APMARL).首先,基于局部可观察马尔可夫博弈模型,建立多机记忆池共享的风险判别机制,通过构建风险状态指数提前预测当前行为的安全性,并根据风险预测结果自适应执行与之匹配的风险处理模式;特别地,针对有风险侵入的非安全状态,提出基于增强型注意力机制的Actor-Critic主动防御网络架构,实现对重点信息的分级增强和危险信息的有效防御.最后,通过广泛的多机协作对抗任务实验表明,具有主动风险防御机制的强化学习策略可以有效降低敌对信息的入侵风险,提高多机器人协同对抗任务的执行效率,增强策略的稳定性和安全性.Deep reinforcement learning(DRL)has become a hotspot in the field of multi-robot systems due to its efficient performance.However,when encountering unstructured environment with time-varying and unknown risks,the traditional DRL methods exposes the disadvantage of poor risk defense ability and fragile system security.The unknown risk will bring nonlinear intrusion to the state space of multi-robot systems in the form of anti attack,which will pose a serious threat to the estimation of robot motion strategy.To solve this problem,this paper proposes a multi-agent reinforcement learning method based on active risk defense mechanism(ARD-MARL).Firstly,based on the locally observable Markov game model,a risk discrimination mechanism with global communication information is established to predict the current behavior state.Secondly,in the strategy deployment stage,we build an event-triggered multi risk processing scheme to implement the matching security strategy for different levels of risk prediction.Then,aiming at the dangerous state with risk intrusion,an active defense Actor-Critic network architecture based on the enhanced attention mechanism is designed.Through magnifying the important information and restraining the threat information,a safer and more efficient motion strategy is generated.Finally,extensive experiments are carried out in multi-agent cooperative and confrontation tasks.The results show that the multi-robot reinforcement learning method with active security defense mechanism can effectively enhance the stability and anti risk ability,and improve the security of information transmissions.
关 键 词:深度强化学习 多机器人 风险防御 协同对抗 事件驱动
分 类 号:TP24[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.235