AODE中基于强化学习的Agent协商模型被引量：14

Reinforcement Learning Based Negotiation Model in AODE

机构地区：[1]南京大学计算机软件新技术国家重点实验室南京大学计算机科学与技术系,南京210093

出　　处：《南京大学学报（自然科学版）》2001年第2期135-141,共7页Journal of Nanjing University（Natural Science）

基　　金：国家自然科学基金! ( 6990 50 0 1 );高等学校博士点基金! ( 970 2 84 2 8)

摘　　要：AODE是我们研制的一个面向Agent的智能系统开发环境 .AODE中基于强化学习的Agent协商模型采用Markov决策过程和连续决策过程分别描述系统状态变化和特定系统状态的Agent协商过程 ,并将强化学习技术应用于Agent协商过程 .该协商模型能够描述动态环境下的多Agent协商 ,模型中所有Agent都采用元对策Q 学习算法时 ,系统能获得动态协商环境下的最优协商解 .AODE is an agent oriented development environment for intelligent software system, and it adopts a reinforcement learning based negotiation model . The negotiation model describes negotiation process as two parts: Markov decision process used to describe negotiation process while environment state changes; sequential decision process used to describe negotiation process in given environment state. The meta Q algorithm is applied to the negotiation model to make it adapt to dynamic environment. Evidence from theoretic analysis and observations of human interaction suggests that if a decision maker can take into consideration what other agents are thinking and furthermore learn how other agents behave from their interactions, its payoff might increase. So, applying learning to agents' negotiation process is receiving more and more consideration. Bazaar is a sequential decision making model of negotiation, and learning is modeled as a Bayesian belief update process. Agents equipped with the learning mechanism can update knowledge during interaction, and have stronger negotiation capability than agents without learning mechanism. But analysis of Bazaar shows: first, absence of knowledge about different environment states makes the model unable to be applied to negotiation in dynamic environment; second, the negotiation strategy adopted in Bazaar “always chooses the action that maximizes the expected payoff given the information available at this stage”. But this strategy neglects the effect the chosen action will make on the following states, agents with this strategy might not get optimal solution in dynamic environment. AODE adopts Markov process to describe migration among system states. According to meta game theory, optimal negotiation strategy is meta game equilibrium solution under given model. So, AODE chooses meta game Q learning algorithm as its learning mechanism, which considers both utilities in current states and possible effects on following states. Negotiation model of AODE can prescribe m

关键词：多Agnet系统强化学习 Agent协商模型 AODE 智能系统开发环境协商策略

分类号：TP182[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

AODE中基于强化学习的Agent协商模型被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

AODE中基于强化学习的Agent协商模型 被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

AODE中基于强化学习的Agent协商模型被引量：14