检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘永超 李桐 成以恒 王博[1,2] 高璞珍 谭思超 田瑞峰 LIU Yongchao;LI Tong;CHENG Yiheng;WANG Bo;GAO Puzhen;TAN Sichao;TIAN Ruifeng(Heilongjiang Provincial Key Laboratory of Nuclear Power System and Equipment,Harbin Engineering University,Harbin 150001,China;Key Laboratory of Nuclear Safety and Advanced Nuclear Energy Technology,Harbin Engineering University,Harbin 150001,China)
机构地区:[1]哈尔滨工程大学黑龙江省核动力装置性能与设备重点实验室,黑龙江哈尔滨150001 [2]哈尔滨工程大学核安全与先进核能技术工信部重点实验室,黑龙江哈尔滨150001
出 处:《原子能科学技术》2024年第5期1076-1083,共8页Atomic Energy Science and Technology
基 金:中核集团领创项目(CNNC-LCKY-202245,CNNC-LCKY-202251);中央高校基本科研业务费项目(3072022JC2401)。
摘 要:核电厂需要大量控制系统来实现系统有效控制与安全运行,其中核电站堆芯是放射性核燃料热源的关键部件,反应堆功率控制关系到核电厂运行的安全性与经济性。为解决传统PID控制器难以准确应对非线性、大功率范围的功率控制问题,本研究以某压水堆核电厂为对象推导并建立了反应堆堆芯模型,采用基于策略梯度的深度强化学习方法与PID控制器结合建立的自适应控制器进行功率控制仿真。仿真结果表明:相较于传统PID控制器,所设计的基于深度确定性策略梯度算法的自适应功率控制器,响应速度更快、控制精度与稳定性更高,同时具有较高的鲁棒性,可以准确快速地控制堆芯功率,跟踪负荷变化。Nuclear power plants need a large number of control systems to achieve effective control and safe operation of the system,in which nuclear power plant core is the key component of radioactive nuclear fuel heat source,and reactor power control is related to the safety and economy of nuclear power plant operation.Therefore,it is of great significance to optimize the design of nuclear reactor power controller.In the controller design stage,the control parameters of PID controller will be fixed in advance,which makes the control effect of PID controller has a certain degree of optimization space.In order to solve the problem that traditional PID controller is difficult to accurately deal with the nonlinear power control in the high power range,this study derived and established a reactor core model for a pressurized water reactor nuclear power plant.The core model includes heat transfer equation,neutron dynamics equation and reactivity equation.In this study,an adaptive controller based on deep reinforcement learning based on policy gradient(deep deterministic policy gradient algorithm)combined with PID(proportional integral derivative)controller was used to simulate power control,and a reward function was constructed.The reward function can be used to represent the optimization of several control evaluation indexes such as response time,threat time,control accuracy,overshoot and oscillation.The depth deterministic policy gradient algorithm can realize real-time optimization policy learning of PID controller control parameters by interacting with core model in real time.After several groups of working conditions with different power levels and different power switching modes were tested.The simulation results show that:In the 100%FP-90%FP step power reduction process(training condition),compared with the traditional PID controller,the self-adaption power controller designed based on the depth deterministic policy gradient algorithm has faster response speed,higher control accuracy and stability.At the same time,under
分 类 号:TL36[核科学技术—核技术及应用]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49