检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:耿俊香 姜静 魏胜楠 段昶 GENG Junxiang;JIANG Jing;WEI Shengnan;DUAN Chang(Shenyang Ligong University,Shenyang 110159,China)
机构地区:[1]沈阳理工大学自动化与电气工程学院,沈阳110159
出 处:《沈阳理工大学学报》2021年第4期29-34,共6页Journal of Shenyang Ligong University
摘 要:多智能体系统在进行协作时,会面临智能体数量多导致博弈关系复杂、不能及时做出正确决策的问题,高效的通信是多智能体协作的有效方式。提出一种基于通信的高效信息学习算法—CIDDPG,在多智能体DDPG算法上建立通信机制,实现智能体之间的沟通交流;并在DDPG算法的策略网络中加入调度模块,以修剪无用信息,提高通信效率;在价值网络中引入注意力机制,有选择地关注来自其他智能体的信息,使其在复杂的环境中高效实现智能体间合作、竞争等互动。两种不同场景的实验证明,CIDDPG算法能够获得比其他算法更高的平均奖励值,且收敛速度快。When multi-agent system cooperates, it can face the large number of agents, which leads to complex game relationship and can′t make correct decisions in time. Efficient communication is an effective way of multi-agent cooperation. An efficient information learning algorithm based on communication is proposed——CIDDPG,which is to establish a communication mechanism on the multi-agent DDPG algorithm to realize the communication between agents. And scheduling module is added to the policy network of multi-agent DDPG algorithm, so as to eliminate useless information and improve communication efficiency. In order to selectively pay attention to information from other subjects, attention mechanism is introduced into value network, so that in the complex environment such as cooperation and competition, the interaction between subjects can be effectively realized.Through experiments in two different scenarios, it is proved that CIDDPG algorithm can obtain higher average reward value than other algorithms, and the convergence speed is fast.
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249