基于约束强化学习的综合能源系统优化调度研究  

Research on optimal dispatch of integrated energy systems based on constrained reinforcement learning

在线阅读下载全文

作  者:李天明 王小君[1] 窦嘉铭 刘曌 司方远 和敬涵[1] LI Tianming;WANG Xiaojun;DOU Jiaming;LIU Zhao;SI Fangyuan;HE Jinghan(School of Electrical Engineering,Beijing Jiaotong University,Beijing 100044,China)

机构地区:[1]北京交通大学电气工程学院,北京100044

出  处:《电力系统保护与控制》2025年第6期1-14,共14页Power System Protection and Control

基  金:国家自然科学基金项目资助(51977005);国家自然科学基金青年基金项目资助(52207112)。

摘  要:“双碳”目标下,分布式能源高比例渗透与异质能源耦合加剧迫使综合能源系统(integrated energy system,IES)优化调度问题的求解难度提升,深度强化学习为解决上述问题提供了有效手段。然而,传统深度强化学习通常将安全约束以惩罚项形式加权添加至奖励函数,加权系数一般由人工确定且在迭代过程中保持固定,一定程度上影响了算法的收敛性能与约束处理能力。对此,提出一种基于约束强化学习的IES优化调度方法。首先,构建了基于IES机组运行与系统潮流约束的安全价值网络,并通过拉格朗日乘子与经济价值网络动态并行协同,分别评估智能体决策的安全与经济价值。其次,利用原始对偶的思路,交替更新智能体策略与拉格朗日乘子,以规避人工设置惩罚系数引起的主观偏差对IES调度决策的影响。同时,利用专家知识引导智能体开展训练,防止其盲目寻优造成算力浪费。最后,基于电-热耦合系统开展仿真算例对比分析,验证了所提方法的安全性与高效性。With the“dual-carbon”goal,the high penetration of distributed energy and intensified coupling of heterogeneous energy sources have made it difficult to solve the optimal dispatch problem in integrated energy systems(IES).Deep reinforcement learning provides an effective means to address this challenge.However,traditional deep reinforcement learning usually weights the safety constraints to the reward function in the form of penalty terms,and the weighting coefficients are usually determined manually and remain fixed during iterations,affecting the convergence performance and constraint handling capability of the algorithm to some extent.To address this issue,this paper proposes an IES optimal dispatch method based on constrained reinforcement learning.First,a safety value network based on IES unit operation and system power flow constraints is constructed.The safety and an economic value of agent decisions are evaluated respectively through the dynamic parallel synergy of Lagrange multipliers and an economic value network.Second,the primal-dual approach is used to update the agent policy and Lagrange multipliers alternately to circumvent the influence of subjective bias caused by manually set penalty coefficients in IES scheduling decisions.Additionally,expert knowledge is leveraged to guide the training process to prevent computational resource waste due to blind optimization.Finally,simulation case studies are carried out based on an electric-thermal coupled system to verify the safety and efficiency of the proposed method.

关 键 词:综合能源系统 优化调度 深度强化学习 约束强化学习 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象