事件触发式多智能体分层安全强化学习运动规划

Multi-agent event triggered hierarchical security reinforcement learning

作　　者：孙辉辉胡春鹤[1,3] 张军国 SUN Hui-hui;HU Chun-he;ZHANG Jun-guo(School of Technology,Beijing Forestry University,Beijing 100083,China;School of Mechanical and Electrical Engineering,Huainan Normal University,Huainan 232038,China;State Key Laboratory of Efficient Production of Forest Resources,Beijing 100083,China)

机构地区：[1]北京林业大学工学院,北京100083 [2]淮南师范学院机械与电气工程学院,安徽淮南232038 [3]林木资源高效生产全国重点实验室,北京100083

出　　处：《控制与决策》2024年第11期3755-3762,共8页Control and Decision

基　　金：国家自然科学基金项目(61703047);河北省高等学校科学技术研究项目(QN2021312)。

摘　　要：针对深度强化学习序贯决策过程中面临的动作安全性问题,研究一种事件触发式多智能体分层安全强化学习运动规划方法.首先,基于受限马尔可夫决策模型,构建一种具备安全约束的多智能体深度确定性策略梯度框架,该框架针对不同状态空间,以事件触发的方式实现运动策略的分层学习;然后,通过引入李雅普诺夫评价网络,建立带有条件约束的目标动作选择机制,并利用拉格朗日乘子法,解决多目标约束求解困难的问题,保证机器人内部决策的安全性;最后,在多机器人强化学习场景中对所提出方法进行实验.实验结果表明:触发式多智能体分层安全强化学习方法使得机器人的状态轨迹从危险状态中快速恢复至安全空间,增强了策略的安全性和多机协同运动规划能力.In order to address the security issues that may arise in the sequential decision-making process of deep reinforcement learning,this paper studies a motion planning method based on multi-agent event triggered hierarchical security reinforcement learning(MEHSRL)method.Firstly,this method constructs a multi-agent twin delayed deep deterministic policy gradient algorithm based on the constrained Markov decision model.The model uses state security events as trigger conditions to implement hierarchical reinforcement learning in different state spaces.Then,by introducing a Lyapunov evaluation network,additional safety constraint rules are constructed for the reinforcement learning network,and the safety of robot decision is ensured by multi constraint objective optimization learning.Finally,the proposed method is tested in the security reinforcement learning scenario.The results show that proposed method achieves the goal of restoring the state trajectory from the dangerous state to the safe space in a limited time,improving the security of the strategy,and the effect of motion planning is better than the comparison method.

关键词：强化学习安全约束运动规划多智能体事件触发

分类号：TP24[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

事件触发式多智能体分层安全强化学习运动规划

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

事件触发式多智能体分层安全强化学习运动规划

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索