检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《模式识别与人工智能》2013年第9期838-844,共7页Pattern Recognition and Artificial Intelligence
基 金:国家自然科学基金项目(No.61070143,61101248);中央高校基本科研业务费项目(No.K5051203003)资助
摘 要:由于强化学习算法动作策略学习比较费时,提出一种基于状态回溯的启发式强化学习方法.分析强化学习过程中重复状态,通过比较状态回溯过程中重复动作的选择策略,引入代价函数描述重复动作的重要性.结合动作奖赏及动作代价提出一种新的启发函数定义.该启发函数在强调动作重要性以加快学习速度的同时,基于代价函数计算动作选择的代价以减少不必要的探索,从而平稳地提高学习效率.对基于代价函数的动作选择策略进行证明.建立两种仿真场景,将算法用于机器人路径规划的仿真实验.实验结果表明基于状态回溯的启发式强化学习方法能平衡考虑获得的奖赏及付出的代价,有效提高Q学习的收敛速度.Since action strategy learning is time-consuming for the reinforcement learning algorithm, a heuristic reinforcement learning algorithm is presented based on state backtracking. By analyzing the repetitive states and comparing the action policies of the reinforcement learning, a cost function is defined to indicate the importance of repetitive actions. A probability-based heuristic function is presented by combining an action reward with an action cost. The proposed algorithm reinforces the importance of an improve. This cost-based action strategy is proved to be reasonable. Two simulation scenarios are built and the experimental results of robot games prove that the proposed algorithm can learn by the tradeoff between rewards and costs, and effectively improve the convergence of Q-learning.
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.184.109