Opponent cart-pole dynamics for reinforcement learning of competing agents  

面向竞争多智能体增强学习的对抗倒立摆动力学系统

在线阅读下载全文

作  者:Xun Huang 黄迅(State Key Laboratory of Turbulence and Complex Systems,College of Engingeering,Peking University,Beijing 100871,China)

机构地区:[1]State Key Laboratory of Turbulence and Complex Systems,College of Engingeering,Peking University,Beijing 100871,China

出  处:《Acta Mechanica Sinica》2022年第5期125-134,I0003,共11页力学学报(英文版)

基  金:supported by the National Science Foundation of China(Grant No.91852201)。

摘  要:In this work,the classical single cart-pole dynamic system is extended to the double cart-pole dynamic system with the inclusion of a competing target,which enables the study of multi-agent deep learning problems at an affordable cost.The corresponding important issues,such as system dynamics,reward function and simultaneous training of opponent agents,are discussed in details.To showcase the system dynamics,a couple of agents are trained and the analysis of the competing results reveals the key pattern for winning the competition.It appears that a defensive agent is always defeated by an offensive agent,albeit the associated neural network has a very limited intelligence.When both agents are defensive,the system dynamics will remain stable and achieve the Nash equilibrium.Overall,the proposed dynamic system could serve a surrogate model and assist the study about how to escape the so-called Thucydides trap.本文面向竞争多智能体的增强学习,设计提出了一个含对抗的倒立摆动力学系统,并给出了具体的实现细节.研究揭示:对于动力学完全相同的两个倒立摆系统,当两个控制的智能体都处于防御状态时两个动力学系统保持稳定;当一方智能体处于攻击态势下,另一方智能体无论防守多么好一定会失稳;当双方都处于攻击态势,则胜率各半,且胜利过程伴随着两个动力学系统的快速失稳.

关 键 词:Cart-pole dynamics Reinforcement learning Thucydides trap Inverted pendulum 

分 类 号:O342[理学—固体力学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象