基于因子分解机用于安全探索的Q表初始化方法

Q-table initialization approach for safe exploration based on factorization machine

作　　者：曾柏森钟勇[1,2] 牛宪华 ZENG Bosen;ZHONG Yong;NIU Xianhua(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;University of Chinese Academy of Sciences,Beijing 100049,China;School of Network and Communication Engineering,Chengdu Technological University,Chengdu Sichuan 611730,China;National Key Laboratory of Science and Technology on Communications(University of Electronic Science and Technology of China),Chengdu Sichuan 611731,China;School of Computer and Software Engineering,Xihua University,Chengdu Sichuan 610039,China)

机构地区：[1]中国科学院成都计算机应用研究所,成都610041 [2]中国科学院大学,北京100049 [3]成都工业学院网络与通信工程学院,成都611730 [4]通信抗干扰技术国家级重点实验室(电子科技大学),成都611731 [5]西华大学计算机与软件工程学院,成都610039

出　　处：《计算机应用》2022年第1期209-214,共6页journal of Computer Applications

基　　金：中国博士后科技基金资助项目(2019M663475)。

摘　　要：针对强化学习的大多数探索/利用策略在探索过程中忽略智能体随机选择动作带来的风险的问题,提出一种基于因子分解机(FM)用于安全探索的Q表初始化方法。首先,引入Q表中已探索的Q值作为先验知识;然后,利用FM建立先验知识中状态和行动间潜在的交互作用的模型;最后,基于该模型预测Q表中的未知Q值,从而进一步引导智能体探索。在OpenAIGym的网格强化学习环境Cliffwalk中进行的A/B测试里,基于所提方法的Boltzmann和置信区间上界(UCB)探索/利用策略的不良探索幕数分别下降了68.12%和89.98%。实验结果表明,所提方法提高了传统策略的探索安全性,同时加快了收敛。In order to solve the problem that most exploration/exploitation strategies of reinforcement learning ignore the risk brought by the agent action selection with random components in exploration process,a Q-table initialization approach based on Factorization Machine(FM)was proposed for safe exploration.Firstly,the explored Q-values were introduced as prior knowledge,and then FM was used to build the model of potential interaction between states and actions in the prior knowledge.Finally,the unknown Q-values in Q-table were predicted based on this model to further guide the exploration of the agents.A/B testing was conducted in the grid reinforcement learning environment Cliffwalk of OpenAI Gym.The number of bad exploration episodes of Boltzmann and Upper Confidence Bound(UCB)exploration/exploitation strategies based on the proposed approach are reduced by 68.12%and 89.98%respectively.Experimental results show that the proposed approach improves the safety of exploration,and accelerates the convergence at the same time.

关键词：强化学习 Q-LEARNING 因子分解机 Q表初始化安全探索

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于因子分解机用于安全探索的Q表初始化方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于因子分解机用于安全探索的Q表初始化方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索