检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:史忠植[1]
出 处:《科学通报》2016年第33期3548-3556,共9页Chinese Science Bulletin
基 金:国家重点基础研究发展计划(2013CB329502)资助
摘 要:学习能力是人类智能的根本特征.2016年3月,Google公司的Alpha Go把深度神经网络与蒙特卡罗树形搜索结合起来,以4胜1负的成绩战胜了围棋世界冠军韩国的李世石.这一结果标志人工智能取得了重大进展.本文重点介绍Alpha Go采用的机器学习方法,包括强化学习、深度学习、深度强化学习,分析存在的问题和最新的研究进展.为了突破通过计算机进行学习的极限,提出认知机器学习,列举可能的研究方向开展研究,使机器智能不断进化,逐步达到人类水平.Learning ability is the basic characteristic of human intelligence. The July 1, 2005 issue of Science published a list of 125 important questions in science. Among them, the question 94 "What are the limits of learning by machines?". The annotation "Computers can already beat the world's best chess players, and they have a wealth of information on the Web to draw on. But abstract reasoning is still beyond any machine". In recent artificial intelligence has made great progresses. In 1997, the rise of the man-machine war, IBM Supercomputer Deep Blue defeated the chess master Garry Kasparov. On February 14, 2011, IBM's Watson supercomputer won a practice round against Jeopardy champions Ken Jennings and Brad Rutter. In March 2016, Google DeepM ind's AlphaG o sealed a 4-1 victory over a South Korean Go grandmaster Lee Se-dol. This paper focuses on the machine learning methods of AlphaG o, including reinforcement learning, deep learning, deep reinforcement learning, analysis of the existing problems and the latest research progress. Deep reinforcement learning is the combination of deep learning and reinforcement learning, which can realize the learning algorithm from the perception to action. Simply said, this is the same as human behavior, input sensing information such as vision, and then, direct output action through the deep neural network. Deep reinforcement learning has the potential to learn a variety of skills for the robot to achieve full autonomy. Even though reinforcement learning is practiced successfully, but feature states need to manually set, for complex scene is a difficult thing, especially easy to cause the dimension disaster, and expression is not good. In 2010, Sascha Lange and Martin Riedmiller proposed deep auto-encoder neural networks in reinforcement learning to extract feature, which is used to control the visual correlation. In 2013, DeepM ind proposed deep Q-network(DQN) in NIPS 2013, using convolution neural network to extract features, and then applied in reinforcemen
关 键 词:强化学习 深度学习 深度强化学习 认知机器学习 学习涌现 学习进化
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249