基于状态回溯代价分析的启发式Q学习  被引量:9

Heuristically Accelerated State Backtracking Q-Learning Based on Cost Analysis

在线阅读下载全文

作  者:方敏[1] 李浩[1] 

机构地区:[1]西安电子科技大学计算机学院西安710071

出  处:《模式识别与人工智能》2013年第9期838-844,共7页Pattern Recognition and Artificial Intelligence

基  金:国家自然科学基金项目(No.61070143,61101248);中央高校基本科研业务费项目(No.K5051203003)资助

摘  要:由于强化学习算法动作策略学习比较费时,提出一种基于状态回溯的启发式强化学习方法.分析强化学习过程中重复状态,通过比较状态回溯过程中重复动作的选择策略,引入代价函数描述重复动作的重要性.结合动作奖赏及动作代价提出一种新的启发函数定义.该启发函数在强调动作重要性以加快学习速度的同时,基于代价函数计算动作选择的代价以减少不必要的探索,从而平稳地提高学习效率.对基于代价函数的动作选择策略进行证明.建立两种仿真场景,将算法用于机器人路径规划的仿真实验.实验结果表明基于状态回溯的启发式强化学习方法能平衡考虑获得的奖赏及付出的代价,有效提高Q学习的收敛速度.Since action strategy learning is time-consuming for the reinforcement learning algorithm, a heuristic reinforcement learning algorithm is presented based on state backtracking. By analyzing the repetitive states and comparing the action policies of the reinforcement learning, a cost function is defined to indicate the importance of repetitive actions. A probability-based heuristic function is presented by combining an action reward with an action cost. The proposed algorithm reinforces the importance of an improve. This cost-based action strategy is proved to be reasonable. Two simulation scenarios are built and the experimental results of robot games prove that the proposed algorithm can learn by the tradeoff between rewards and costs, and effectively improve the convergence of Q-learning.

关 键 词:代价分析 启发函数 状态回溯 Q学习 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象