基于优势函数输入扰动的多无人艇协同策略优化方法

Multi-USVs Cooperative Policy Optimization Method Based on Disturbed Input of Advantage Function

作　　者：任璐柯亚男柳文章穆朝絮孙长银 REN Lu;KE Ya-Nan;LIU Wen-Zhang;MU Chao-Xu;SUN Chang-Yin(School of Artificial Intelligence,Anhui University,Hefei 230601;Anhui Provincial Key Laboratory of Security Artificial Intelligence,Hefei 230601;School of Electrical Automation and Information Engineering,Tianjin University,Tianjin 300072)

机构地区：[1]安徽大学人工智能学院,合肥230601 [2]安徽省安全人工智能重点实验室,合肥230601 [3]天津大学电气自动化与信息工程学院,天津300072

出　　处：《自动化学报》2025年第4期824-834,共11页Acta Automatica Sinica

基　　金：国家自然科学基金(62303009)资助。

摘　　要：多无人艇(Multiple unmanned surface vehicles,Multi-USVs)协同导航对于实现高效的海上作业至关重要,而如何在开放未知海域处理多艇之间复杂的协作关系、实现多艇自主协同决策是当前亟待解决的难题.近年来,多智能体强化学习(Multi-agent reinforcement learning,MARL)在解决复杂的多体决策问题上展现出巨大的潜力,被广泛应用于多无人艇协同导航任务中.然而,这种基于数据驱动的方法通常存在探索效率低、探索与利用难平衡、易陷入局部最优等问题.因此,在集中训练和分散执行(Centralized training and decentralized execution,CTDE)框架的基础上,考虑从优势函数输入端注入扰动量来提升优势函数的泛化能力,提出一种新的基于优势函数输入扰动的多智能体近端策略优化(Noise-advantage multi-agent proximal policy optimization,NA-MAPPO)方法,从而提升多无人艇协同策略的探索效率.实验结果表明,与现有的基准算法相比,所提方法能够有效提升多无人艇协同导航任务的成功率,缩短策略的训练时间以及任务的完成时间,从而提升多无人艇协同探索效率,避免策略陷入局部最优.Cooperative navigation of multiple unmanned surface vehicles(Multi-USVs)is crucial for achieving efficient maritime operations.However,it remains challenging to address the complex collaborative relationship of Multi-USVs and enable autonomous cooperative decision-making in open and unknown sea areas.In recent years,multi-agent reinforcement learning(MARL)has shown significant potential in addressing complex multi-agent decision-making problems and has been widely applied in the cooperative navigation tasks of Multi-USVs.Nevertheless,the data-driven method often encounters problems such as low exploration efficiency,difficulty in balancing exploration and utilization,and the likelihood of getting stuck in local optima.Therefore,under the centralized training and decentralized execution(CTDE)framework,this paper considers injecting disturbances into the advantage function and its input data to improve the generalization ability of the advantage function.Then,a novel noise-advantage multi-agent proximal policy optimization(NA-MAPPO)method is proposed,thereby enhancing the exploration efficiency of the cooperative policy for Multi-USVs.Experimental results demonstrate that compared to the existing benchmark algorithms,the proposed method can significantly improve the success rate of the cooperative navigation tasks,reduce the time of training policy and the time of completing task,thereby enhancing the cooperative exploration efficiency of the Multi-USVs system and preventing the policy from falling into local optimum.

关键词：多无人艇协同近端策略优化多智能体强化学习输入扰动

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于优势函数输入扰动的多无人艇协同策略优化方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于优势函数输入扰动的多无人艇协同策略优化方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索