检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈妙云 王雷[1] 丁治强 CHEN Miao-yun;WANG Lei;DING Zhi-qiang(School of Information Science and Technology,University of Science and Technology of China,Hefei Anhui 230031,China)
机构地区:[1]中国科学技术大学信息科技学院,安徽合肥230031
出 处:《计算机仿真》2021年第2期301-307,共7页Computer Simulation
基 金:中科院创新基金(高技术项目CXJJ-17-M139);中科院重大专项课题(KGFZD-135-18-027)。
摘 要:在多智能体仿真中使用行为树进行决策具有直观、易扩展等优点,但行为树的设计过程过于复杂,人工调试时效率低下。引入Q-Learning来实现行为树的自动设计。为解决传统Q-Learning的收敛速度慢的问题,将模拟退火算法中的Metropolis准则应用到动作选择策略中,随着学习过程自适应改变次优动作的选择概率以及将动态规划思想应用到Q值更新策略。实验结果证明,基于改进的多步Q-Learning行为树的智能体决策模型具有更快的收敛速度,并且能够实现行为树的自动设计和优化。The use of behavior tree for decision-making in multi-agent simulation is intuitive and easy to expand,but the design process of behavior tree is complex and the efficiency of manual debugging is low.The paper introduced Q-Learning to realize the automatic design of behavior tree.In order to solve the problem of slow convergence speed of traditional Q-Learning,a simulated annealing algorithm was used to improve the action selection strategy of multi-step Q-learning,which reduces the probability of non-optimal action selection,and a dynamic programming algorithm was used to update Q value function in reverse order.The experimental results show that the agent based on the improved Q-Learning behavior tree has faster decision-making speed,and can achieve automatic scheduling while reducing the use of conditional nodes,and get more reasonable behavior decision.
关 键 词:多智能体 行为树 模拟退火 动态规划 用动态规划和模拟退火改进的多步Q学习
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.202