基于深度强化学习的反应堆热工系统运行温度智能控制方法研究  

Research on Intelligent Control Method of Operating Temperature of Reactor Thermal System Based on Deep Reinforcement Learning

在线阅读下载全文

作  者:刘永超 谭思超 李桐 程家豪 王博[1,2] 高璞珍 田瑞峰 Liu Yongchao;Tan Sichao;Li Tong;Cheng Jiahao;Wang Bo;Gao Puzhen;Tian Ruifeng(Heilongjiang Provincial Key Laboratory of Nuclear Power System&Equipment,Harbin Engineering University,Harbin,150001,China;Key Laboratory of Nuclear Safety and Advanced Nuclear Energy Technology,Ministry of Industry and Information Technology,Harbin Engineering University,Harbin,150001,China)

机构地区:[1]哈尔滨工程大学黑龙江省核动力装置性能与设备重点实验室,哈尔滨150001 [2]哈尔滨工程大学核安全与先进核能技术工信部重点实验室,哈尔滨150001

出  处:《核动力工程》2024年第S2期197-205,共9页Nuclear Power Engineering

基  金:中核集团领创项目(CNNC-LCKY-202251)。

摘  要:传统比例-积分-微分(PID)控制方法难以实现良好稳定的控制效果。本文提出了基于深度强化学习的反应堆热工系统运行温度智能控制方法,步骤为:①搭建反应堆热工系统RELAP5模型,并对其进行交互扩展,使其能够支持深度强化学习技术;②在柔性动作-评价(SAC)算法的基础上耦合了多变量长短期记忆(LSTM)神经网络,有效提取了控制历史信息的时序特征;③基于优化目标驱动的控制模型可自行收集数据样本,并通过自我学习机制,完成控制策略的优化;④根据多变量状态特征和时序特征,实现了对运行温度的端到端控制。通过与PID控制器的仿真实验对比验证,本文提出的方法具有优异的负荷跟踪能力与扰动抑制能力,具备良好的环境适应性与控制鲁棒性。Traditional proportional-integral-differential(PID)control method is difficult to achieve good and stable control effect.In this paper,an intelligent control method of operating temperature of reactor thermal system based on deep reinforcement learning is proposed.The steps are as follows:RELAP5 model of reactor thermal system is built and extended interactively,so that it can support deep reinforcement learning technology.Secondly,based on the Soft Actor-Critic(SAC)algorithm and coupled with the multivariable Long Short-Term Memory(LSTM)neural network,the time series characteristics of the control history information are effectively extracted.Then,the control model driven by optimization goal can collect data samples by itself,and complete the optimization of control strategy through self-learning mechanism.According to the multivariable state characteristics and time series characteristics,the end-to-end control of operating temperature is realized.Compared with the simulation experiment of PID controller,the proposed method has excellent load tracking ability and disturbance suppression ability,and has good environmental adaptability and control robustness.

关 键 词:反应堆热工系统 深度强化学习 柔性动作-评价(SAC) 长短期记忆(LSTM) 智能控制 

分 类 号:TL361[核科学技术—核技术及应用]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象