检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Yazhou Hu Fengzhen Tang Jun Chen Wenxue Wang
机构地区:[1]College of Mechanical and Electronic Engineering,Northwest A&F University,Yangling,Shaanxi 712100,China [2]The State Key Laboratory of Robotics,Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang,Liaoning 110016,China [3]Institutes for Robotics and Intelligent Manufacturing,Chinese Academy of Sciences,Shenyang,Liaoning 110169,China
出 处:《Control Theory and Technology》2021年第4期455-464,共10页控制理论与技术(英文版)
摘 要:Reinforcement learning is one of the fastest growing areas in machine learning,and has obtained great achievements in biomedicine,Internet of Things(IoT),logistics,robotic control,etc.However,there are still many challenges for engineering applications,such as how to speed up the learning process,how to balance the trade-of between exploration and exploitation.Quantum technology,which can solve complex problems faster than classical methods,especially in supercomputers,provides us a new paradigm to overcome these challenges in reinforcement learning.In this paper,a quantum-enhanced reinforcement learning is pictured for optimal control.In this algorithm,the states and actions of reinforcement learning are quantized by quantum technology.And then,a probability amplifcation method,which can efectively avoid the trade-of between exploration and exploitation via quantized technology,is presented.Finally,the optimal control policy is learnt during the process of reinforcement learning.The performance of this quantized algorithm is demonstrated in both MountainCar reinforcement learning environment and CartPole reinforcement learning environment—one kind of classical control reinforcement learning environment in the OpenAI Gym.The preliminary study results validate that,compared with Q-learning,this quantized reinforcement learning method has better control performance without considering the trade-of between exploration and exploitation.The learning performance of this new algorithm is stable with diferent learning rates from 0.01 to 0.10,which means it is promising to be employed in unknown dynamics systems.
关 键 词:Quantum theory Reinforcement learning Quantum computation State superposition Optimal control
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.79