Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning

作　　者：DONG Yubo CUI Tao ZHOU Yufan SONG Xun ZHU Yue DONG Peng 董玉博;崔涛;周禹帆;宋勋;祝月;董鹏(School of Aeronautics and Astronautics,Shanghai Jiao Tong University,Shanghai,200240,China;Beijing Institute of Electronic System Engineering,Beijing,100854,China)

机构地区：[1]School of Aeronautics and Astronautics,Shanghai Jiao Tong University,Shanghai,200240,China [2]Beijing Institute of Electronic System Engineering,Beijing,100854,China

出　　处：《Journal of Shanghai Jiaotong university(Science)》2024年第4期646-655,共10页上海交通大学学报（英文版）

基　　金：National Natural Science Foundation of China(Nos.61803260,61673262 and 61175028)。

摘　　要：Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.

关键词：multi-agent reinforcement learning deep reinforcement learning(DRL) long episode reward function

分类号：TP242[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索