检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Xun Huang 黄迅(State Key Laboratory of Turbulence and Complex Systems,College of Engingeering,Peking University,Beijing 100871,China)
出 处:《Acta Mechanica Sinica》2022年第5期125-134,I0003,共11页力学学报(英文版)
基 金:supported by the National Science Foundation of China(Grant No.91852201)。
摘 要:In this work,the classical single cart-pole dynamic system is extended to the double cart-pole dynamic system with the inclusion of a competing target,which enables the study of multi-agent deep learning problems at an affordable cost.The corresponding important issues,such as system dynamics,reward function and simultaneous training of opponent agents,are discussed in details.To showcase the system dynamics,a couple of agents are trained and the analysis of the competing results reveals the key pattern for winning the competition.It appears that a defensive agent is always defeated by an offensive agent,albeit the associated neural network has a very limited intelligence.When both agents are defensive,the system dynamics will remain stable and achieve the Nash equilibrium.Overall,the proposed dynamic system could serve a surrogate model and assist the study about how to escape the so-called Thucydides trap.本文面向竞争多智能体的增强学习,设计提出了一个含对抗的倒立摆动力学系统,并给出了具体的实现细节.研究揭示:对于动力学完全相同的两个倒立摆系统,当两个控制的智能体都处于防御状态时两个动力学系统保持稳定;当一方智能体处于攻击态势下,另一方智能体无论防守多么好一定会失稳;当双方都处于攻击态势,则胜率各半,且胜利过程伴随着两个动力学系统的快速失稳.
关 键 词:Cart-pole dynamics Reinforcement learning Thucydides trap Inverted pendulum
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.143.211.215