基于DP-SAMQ行为树的智能体决策模型研究  被引量:2

Research on Agent Decision Model Based on Multi-Step Q-Learning Behavior Tree

在线阅读下载全文

作  者:陈妙云 王雷[1] 丁治强 CHEN Miao-yun;WANG Lei;DING Zhi-qiang(School of Information Science and Technology,University of Science and Technology of China,Hefei Anhui 230031,China)

机构地区:[1]中国科学技术大学信息科技学院,安徽合肥230031

出  处:《计算机仿真》2021年第2期301-307,共7页Computer Simulation

基  金:中科院创新基金(高技术项目CXJJ-17-M139);中科院重大专项课题(KGFZD-135-18-027)。

摘  要:在多智能体仿真中使用行为树进行决策具有直观、易扩展等优点,但行为树的设计过程过于复杂,人工调试时效率低下。引入Q-Learning来实现行为树的自动设计。为解决传统Q-Learning的收敛速度慢的问题,将模拟退火算法中的Metropolis准则应用到动作选择策略中,随着学习过程自适应改变次优动作的选择概率以及将动态规划思想应用到Q值更新策略。实验结果证明,基于改进的多步Q-Learning行为树的智能体决策模型具有更快的收敛速度,并且能够实现行为树的自动设计和优化。The use of behavior tree for decision-making in multi-agent simulation is intuitive and easy to expand,but the design process of behavior tree is complex and the efficiency of manual debugging is low.The paper introduced Q-Learning to realize the automatic design of behavior tree.In order to solve the problem of slow convergence speed of traditional Q-Learning,a simulated annealing algorithm was used to improve the action selection strategy of multi-step Q-learning,which reduces the probability of non-optimal action selection,and a dynamic programming algorithm was used to update Q value function in reverse order.The experimental results show that the agent based on the improved Q-Learning behavior tree has faster decision-making speed,and can achieve automatic scheduling while reducing the use of conditional nodes,and get more reasonable behavior decision.

关 键 词:多智能体 行为树 模拟退火 动态规划 用动态规划和模拟退火改进的多步Q学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象