检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]长江大学信息与数学学院,湖北荆州434023
出 处:《长江大学学报(自然科学版)》2017年第21期40-44,共5页Journal of Yangtze University(Natural Science Edition)
基 金:国家自然科学基金项目(61503047);长江大学大学生创新创业训练计划项目(2016123)
摘 要:增强学习近年来多被用于智能体自动游戏,但增强学习在面对过大的状态或者行动空间时不能很好地处理。深度增强学习结合深度学习的感知能力和增强学习的决策能力,可以有效解决环境复杂问题。将增强学习与深度学习结合,通过改进的Markov决策过程逐步学习最优策略。首先找到目前的环境中最有价值的状态,从而产生最大积累奖励的行动,然后通过利用深度增强学习方法训练计算机自动完成一个简单游戏,使用控制变量法分别分析迭代次数和游戏难易程度对游戏得分的影响。试验结果表明,在外界环境相同时,准确率随着试验迭代次数的增大或游戏难度的减弱而增大,从而验证了智能体可以通过外界因素的改变进行更有效训练,最终获取最优结果。The reinforcement learning has been used for automatic games in recent years.However,the reinforcement learning can not work well for the excessive state or space.Deep reinforcement learning integrates with the advantages of the perception of deep learning and the decision making of reinforcement learning can solve the problem in complex environment.The reinforcement learning is combined with deep learning,the optimal strategy is learned step by step in the improvement of Markov decision process.Firstly,the most valuable state in current environment is determined to select the action of maximizing the accumulation of reward.Then a computer is trained to accomplish a simple game automatically by using the deep reinforcement learning method.The control variable method is used to analyze the impact of the number of iterations and the degree of difficulty of the game on the game score.Finally,the experimental results show that the accuracy of the experiment increases with the increase of the number of iteration and the difficulty of the game.The result shows that the agent can be trained more effectively and can get optimal result through changes in the external factors.
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.220.121.27