检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李天明 王小君[1] 窦嘉铭 刘曌 司方远 和敬涵[1] LI Tianming;WANG Xiaojun;DOU Jiaming;LIU Zhao;SI Fangyuan;HE Jinghan(School of Electrical Engineering,Beijing Jiaotong University,Beijing 100044,China)
出 处:《电力系统保护与控制》2025年第6期1-14,共14页Power System Protection and Control
基 金:国家自然科学基金项目资助(51977005);国家自然科学基金青年基金项目资助(52207112)。
摘 要:“双碳”目标下,分布式能源高比例渗透与异质能源耦合加剧迫使综合能源系统(integrated energy system,IES)优化调度问题的求解难度提升,深度强化学习为解决上述问题提供了有效手段。然而,传统深度强化学习通常将安全约束以惩罚项形式加权添加至奖励函数,加权系数一般由人工确定且在迭代过程中保持固定,一定程度上影响了算法的收敛性能与约束处理能力。对此,提出一种基于约束强化学习的IES优化调度方法。首先,构建了基于IES机组运行与系统潮流约束的安全价值网络,并通过拉格朗日乘子与经济价值网络动态并行协同,分别评估智能体决策的安全与经济价值。其次,利用原始对偶的思路,交替更新智能体策略与拉格朗日乘子,以规避人工设置惩罚系数引起的主观偏差对IES调度决策的影响。同时,利用专家知识引导智能体开展训练,防止其盲目寻优造成算力浪费。最后,基于电-热耦合系统开展仿真算例对比分析,验证了所提方法的安全性与高效性。With the“dual-carbon”goal,the high penetration of distributed energy and intensified coupling of heterogeneous energy sources have made it difficult to solve the optimal dispatch problem in integrated energy systems(IES).Deep reinforcement learning provides an effective means to address this challenge.However,traditional deep reinforcement learning usually weights the safety constraints to the reward function in the form of penalty terms,and the weighting coefficients are usually determined manually and remain fixed during iterations,affecting the convergence performance and constraint handling capability of the algorithm to some extent.To address this issue,this paper proposes an IES optimal dispatch method based on constrained reinforcement learning.First,a safety value network based on IES unit operation and system power flow constraints is constructed.The safety and an economic value of agent decisions are evaluated respectively through the dynamic parallel synergy of Lagrange multipliers and an economic value network.Second,the primal-dual approach is used to update the agent policy and Lagrange multipliers alternately to circumvent the influence of subjective bias caused by manually set penalty coefficients in IES scheduling decisions.Additionally,expert knowledge is leveraged to guide the training process to prevent computational resource waste due to blind optimization.Finally,simulation case studies are carried out based on an electric-thermal coupled system to verify the safety and efficiency of the proposed method.
关 键 词:综合能源系统 优化调度 深度强化学习 约束强化学习
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15