Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法  被引量:23

A multi-agent reinforcement learning algorithm based on improved DDPG in Actor-Critic framework

在线阅读下载全文

作  者:陈亮 梁宸 张景异 刘韵婷 CHEN Liang;LIANG Chen;ZHANG Jing-yi;LIU Yun-ting(College of Automation and Electrical Engineering,Shenyang Ligong University,Shenyang 110159,China)

机构地区:[1]沈阳理工大学自动化与电气工程学院,沈阳110159

出  处:《控制与决策》2021年第1期75-82,共8页Control and Decision

基  金:国家重点研发计划项目(2017YFC0821004,2017YFC0821001);辽宁省自然科学基金项目(20170540788);辽宁省教育厅基本科研项目(LG201707).

摘  要:现实世界的人工智能应用通常需要多个agent协同工作,人工agent之间有效的沟通和协调是迈向通用人工智能不可或缺的一步.以自主开发的警员训练虚拟环境为测试场景,设定任务需要多个不同兵种agent小队互相协作或对抗完成.为保证沟通方式有效且可扩展,提出一种混合DDPG(Mi-DDPG)算法.首先,在Actor网络加入双向循环神经网络(BRNN)作为同兵种agent信息交流层;然后,在Critic网络加入其他兵种agent信息来学习多agent协同策略.另外,为了缓解训练压力,采用集中训练,分散执行的框架,同时对Critic网络里的Q函数进行模块化处理.实验中,在不同的场景下用Mi-DDPG算法与其他算法进行对比,Mi-DDPG在收敛速度和任务完成度方面有明显提高,具有在现实世界应用的潜在价值.Real-world artificial intelligence(AI)applications often require multiple agents to work together,and effective communication and coordination between artificial agents is an indispensable step toward universal artificial intelligence.This paper takes the self-developed virtual environment for police training as a test scenario.Setting tasks requires multiple different service agent teams to cooperate or fight against each other.In order to ensure that the communication method is effective and scalable,this paper proposes the mixed deep deterministic policy gradient(Mi-DDPG)algorithm.Firstly,the bidirectional recurrent neural networks(BRNN)is added to the Actor network as the information exchange layer of the same type of agent,and then the other agent information is added to the Critic network to learn the multi-agent cooperation strategy.In addition,in order to alleviate the training pressure,the centralized training and distributed execution framework are adopted,and the Q function in the Critic network is modularized.In the experiment,the MiDDPG algorithm is compared with other algorithms in different scenarios,which shows its most advanced performance and potential value in real-world.

关 键 词:强化学习 深度学习 多智能体 RNN DDPG Actor-Critic 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象