检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]南京大学计算机软件新技术国家重点实验室南京大学计算机科学与技术系,南京210093
出 处:《南京大学学报(自然科学版)》2001年第2期135-141,共7页Journal of Nanjing University(Natural Science)
基 金:国家自然科学基金! ( 6990 50 0 1 );高等学校博士点基金! ( 970 2 84 2 8)
摘 要:AODE是我们研制的一个面向Agent的智能系统开发环境 .AODE中基于强化学习的Agent协商模型采用Markov决策过程和连续决策过程分别描述系统状态变化和特定系统状态的Agent协商过程 ,并将强化学习技术应用于Agent协商过程 .该协商模型能够描述动态环境下的多Agent协商 ,模型中所有Agent都采用元对策Q 学习算法时 ,系统能获得动态协商环境下的最优协商解 .AODE is an agent oriented development environment for intelligent software system, and it adopts a reinforcement learning based negotiation model . The negotiation model describes negotiation process as two parts: Markov decision process used to describe negotiation process while environment state changes; sequential decision process used to describe negotiation process in given environment state. The meta Q algorithm is applied to the negotiation model to make it adapt to dynamic environment. Evidence from theoretic analysis and observations of human interaction suggests that if a decision maker can take into consideration what other agents are thinking and furthermore learn how other agents behave from their interactions, its payoff might increase. So, applying learning to agents' negotiation process is receiving more and more consideration. Bazaar is a sequential decision making model of negotiation, and learning is modeled as a Bayesian belief update process. Agents equipped with the learning mechanism can update knowledge during interaction, and have stronger negotiation capability than agents without learning mechanism. But analysis of Bazaar shows: first, absence of knowledge about different environment states makes the model unable to be applied to negotiation in dynamic environment; second, the negotiation strategy adopted in Bazaar “always chooses the action that maximizes the expected payoff given the information available at this stage”. But this strategy neglects the effect the chosen action will make on the following states, agents with this strategy might not get optimal solution in dynamic environment. AODE adopts Markov process to describe migration among system states. According to meta game theory, optimal negotiation strategy is meta game equilibrium solution under given model. So, AODE chooses meta game Q learning algorithm as its learning mechanism, which considers both utilities in current states and possible effects on following states. Negotiation model of AODE can prescribe m
关 键 词:多Agnet系统 强化学习 Agent协商模型 AODE 智能系统开发环境 协商策略
分 类 号:TP182[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.221.207.166