基于因子分解机用于安全探索的Q表初始化方法  

Q-table initialization approach for safe exploration based on factorization machine

在线阅读下载全文

作  者:曾柏森 钟勇[1,2] 牛宪华 ZENG Bosen;ZHONG Yong;NIU Xianhua(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;University of Chinese Academy of Sciences,Beijing 100049,China;School of Network and Communication Engineering,Chengdu Technological University,Chengdu Sichuan 611730,China;National Key Laboratory of Science and Technology on Communications(University of Electronic Science and Technology of China),Chengdu Sichuan 611731,China;School of Computer and Software Engineering,Xihua University,Chengdu Sichuan 610039,China)

机构地区:[1]中国科学院成都计算机应用研究所,成都610041 [2]中国科学院大学,北京100049 [3]成都工业学院网络与通信工程学院,成都611730 [4]通信抗干扰技术国家级重点实验室(电子科技大学),成都611731 [5]西华大学计算机与软件工程学院,成都610039

出  处:《计算机应用》2022年第1期209-214,共6页journal of Computer Applications

基  金:中国博士后科技基金资助项目(2019M663475)。

摘  要:针对强化学习的大多数探索/利用策略在探索过程中忽略智能体随机选择动作带来的风险的问题,提出一种基于因子分解机(FM)用于安全探索的Q表初始化方法。首先,引入Q表中已探索的Q值作为先验知识;然后,利用FM建立先验知识中状态和行动间潜在的交互作用的模型;最后,基于该模型预测Q表中的未知Q值,从而进一步引导智能体探索。在OpenAIGym的网格强化学习环境Cliffwalk中进行的A/B测试里,基于所提方法的Boltzmann和置信区间上界(UCB)探索/利用策略的不良探索幕数分别下降了68.12%和89.98%。实验结果表明,所提方法提高了传统策略的探索安全性,同时加快了收敛。In order to solve the problem that most exploration/exploitation strategies of reinforcement learning ignore the risk brought by the agent action selection with random components in exploration process,a Q-table initialization approach based on Factorization Machine(FM)was proposed for safe exploration.Firstly,the explored Q-values were introduced as prior knowledge,and then FM was used to build the model of potential interaction between states and actions in the prior knowledge.Finally,the unknown Q-values in Q-table were predicted based on this model to further guide the exploration of the agents.A/B testing was conducted in the grid reinforcement learning environment Cliffwalk of OpenAI Gym.The number of bad exploration episodes of Boltzmann and Upper Confidence Bound(UCB)exploration/exploitation strategies based on the proposed approach are reduced by 68.12%and 89.98%respectively.Experimental results show that the proposed approach improves the safety of exploration,and accelerates the convergence at the same time.

关 键 词:强化学习 Q-LEARNING 因子分解机 Q表初始化 安全探索 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象