机构地区:[1]江西财经大学信息管理学院,南昌330013 [2]厦门大学自动化系,福建厦门361005
出 处:《计算机学报》2018年第1期28-46,共19页Chinese Journal of Computers
基 金:国家自然科学基金(61375070;61562033;61772442;71361011);江西省社会科学规划基金(16GJ20);江西省自然科学基金(20171BAB202022)资助~~
摘 要:不确定性多智能体序贯决策是人工智能研究领域一个重要的研究问题,主要求解智能体如何在与其他智能体的交互中优化本身的决策.特别在部分可观测的随机博弈设置下,智能体不能探测到真实的外部环境状态,必须依靠所接收的观察来推断可能的状态;同时,智能体的动作也具有相当的随机性,直接影响到其他智能体的决策.智能体的交互主要通过对共同环境状态的影响决定它们各自决策的报酬.因此,如何对多智能体之间的交互进行建模是求解该问题的核心任务.目前大部分的研究主要通过对整个智能体系统进行建模,采取集中规划、分散控制的求解机制:首先,统一计算所有智能体的联合决策;然后,各个智能体执行分配得到的局部决策.该求解技术往往要求所有的智能体必须对全局环境有一个共同的知识假设,因此该研究工作一般只适用于合作型的多智能体系统.相比之下,交互式动态影响图是从个体决策者的角度研究不确定性多智能体序贯决策问题的一种普遍适用的建模方法,克服了传统的博弈论方法求解多智能体决策问题的局限性.求解交互式动态影响图模型的主要困难在于复杂的智能体相互建模过程.特别是在竞争的环境下,由于智能体缺少相互交流的机会,也不能预知其他智能体的真实模型,必须通过预测和推理其他智能体的行为来决定本身的动作.主要求解思路是首先假设其他智能体的可能模型,然后通过求解这些可能的模型来预测智能体的行为.由于其他智能体的备选模型往往有很多,而且随着决策时间的推移,模型的不确定性增强,导致可能的模型呈指数增长,这给求解交互式动态影响图带来了极大的困难.基于目前大量的交互式动态影响图研究工作,文中旨在总结归纳模型的具体表达方式和求解方法,并在此基础上提出一种新的模型求解方法.针对�Mult research issue in iagent sequential decision-making problem under uncertainty is an important the area of artificial intelligence, and mainly focuses on solutions to the problem of how agents shall optimize their decisions observable, stochastic games agents can't in the interactions. Particularly in a setting of partially perceive the precise states of external environments and rely on received observations to infer the hidden agents have a direct influence on decisions of other states. Meanwhile, the stochastic actions of agents. Their interactions impact the state changes of the common environment, which decides rewards in executing their actions. Hence the core task is to model agents' interactions and subsequently to solve the model. Most of the existing research models the entire multiagent systems and follows the mechanism of centralized plan and decentralized control to solve the problem. It first computes a joint policy for all the agents and then assigns the local policies to the agents for a final execution. This approach often demands that all the agents hold common knowledge of the global environment, which can only be applied in cooperative multiagent systems. In contrast, interactive dynamic influence diagram (I-DID), which takes the individual decision-making perspective, provides a general framework for solvingmultiagent sequential decision problems under uncertainty. The solutions remove limitations of traditional multiagent decision approaches based on game theory. The main difficulty arises from the complicated process of mutually modeling of multiple agents in I-DID. In particular, agents can't communicate in a competitive setting so that they can't perceive the true model other agents, which requires the agents to predict and reason with other agents~ behavior in order to optimize their own decisions. The solution is to first hypothesize a number of candidate models assigned to other agents and then solve the models to predict their behavior. Since the number of candidate mo
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...