基于一维卷积循环神经网络的深度强化学习算法  被引量:8

Reinforcement Learning Algorithm Based on One-dimensional Convolutional Recurrent Network

在线阅读下载全文

作  者:畅鑫 李艳斌[1,2] 田淼 陈苏逸 杜宇峰 赵研[1,2] CHANG Xin;LI Yanbin;TIAN Miao;CHEN Suyi;DU Yufeng;ZHAO Yan(The 54th Research Institute of China Electronics Technology Group Corporation(CETC54),Shijiazhuang 050081,China;Hebei Key Laboratory of Electromagnetic Spectrum Cognition and Control,The 54th Research Institute of China Electronics Technology Group Corporation(CETC54),Shijiazhuang 050081,China;School of Information and Communication Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China)

机构地区:[1]中国电子科技集团公司第五十四研究所,石家庄050081 [2]河北省电磁频谱认知与管控重点实验室,石家庄050081 [3]电子科技大学信息与通信工程学院,成都611731

出  处:《计算机测量与控制》2022年第1期258-265,共8页Computer Measurement &Control

基  金:中国博士后科学基金(2021M693002)。

摘  要:针对现有深度强化学习算法在状态空间维度大的环境中难以收敛的问题,提出了在时间维度上提取特征的基于一维卷积循环网络的强化学习算法;首先在深度Q网络(DQN,deep Q network)的基础上构建一个深度强化学习系统;然后在深度循环Q网络(DRQN,deep recurrent Q network)的神经网络结构基础上加入了一层一维卷积层,用于在长短时记忆(LSTM,long short-term memory)层之前提取时间维度上的特征;最后在与时序相关的环境下对该新型强化学习算法进行训练和测试;实验结果表明这一改动可以提高智能体的决策水平,并使得深度强化学习算法在非图像输入的时序相关环境中有更好的表现。Existing deep reinforcement learning algorithms have difficulty converging in environments with large state space dimensions.So a reinforcement learning algorithm based on one-dimensional convolutional recurrent networks that extracts features in the time dimension is proposed.Firstly,a deep reinforcement learning system based on DQN is built.Then a one-dimensional convolutional layer is added into the neural network architecture of DRQN for extracting the features in the time dimension before the LSTM layer.Finally,the new reinforcement learning algorithm is trained and tested in a timing-related environment.The experimental results show that this change can improve the decision-making level of the agent,making deep reinforcement learning algorithms have better performance in non-image input and timing-related environment.

关 键 词:强化学习 深度学习 长短时记忆网络 卷积神经网络 深度Q网络 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象