基于双缓冲区生成对抗模仿学习的电力系统实时安全约束经济调度  

Security-constrained Economic Dispatch of Power Systems Based on Dual Buffer Generative Adversarial Imitation Learning

作  者:李东颖 朱建全[1] 陈一熙 LI Dongying;ZHU Jianquan;CHEN Yixi(School of Electric Power Engineering,South China University of Technology,Guangzhou 510640,Guangdong Province,China)

机构地区:[1]华南理工大学电力学院,广东省广州市510640

出  处:《电网技术》2025年第3期1121-1129,I0076-I0079,共13页Power System Technology

基  金:国家自然科学基金项目(51977081)。

摘  要:随着新能源渗透率不断攀升,电力系统波动性和随机性日趋加剧,电网安全经济运行正面临着严峻挑战。为此,该文提出了一种基于改进生成对抗模仿学习算法的实时安全约束经济调度方法。首先,将新能源电力系统多时段安全约束经济调度问题建模为马尔可夫决策过程。其次,针对常规深度强化学习算法训练时间冗长和设计主观性强等弊端,采用生成对抗模仿学习算法对马尔可夫决策过程进行求解。接着,提出了一种改进的生成对抗模仿学习算法,通过双缓冲区机制使生成对抗模仿学习兼容异策略深度强化学习算法,进而与柔性行动器-评判器算法结合,显著提高了算法的训练性能。算例结果表明,所提方法在保证毫秒级的决策速度的同时,在离线训练时的收敛速度、在线决策时的经济性与安全性等方面相较于传统算法均展示出了显著的提升。The escalating penetration rate of emerging renewable energy sources exacerbates the inherent volatility and stochastic nature of power systems,thereby presenting formidable challenges to the safe and economic operation of the power system.To address this challenge,this paper presents an improved generative adversarial imitation learning algorithm tailored to power systems'real-time security-constrained economic dispatch.First,the security-constrained economic dispatch problem of renewable energy-integrated power systems is formulated as a Markov decision process.Second,recognizing the limitations of conventional deep reinforcement learning algorithms,notably high training time consumption and pronounced design subjectivity,this paper employs a generative adversarial imitation learning algorithm to address this Markov decision process.Additionally,an improved generative adversarial imitation learning algorithm is proposed,which renders the generative adversarial imitation learning algorithm compatible with various off-policy deep reinforcement learning algorithms through a dual buffer mechanism.In the proposed algorithm,the combination with the soft Actor-Critic algorithm significantly enhances the training performance.The simulation results illustrate that the proposed algorithm not only markedly accelerates the convergence speed during offline training but also improves the economy and security in online decision-making compared to traditional algorithms while ensuring a millisecond-level decision speed.

关 键 词:安全约束经济调度 模仿学习 生成对抗网络 双缓冲机制 深度强化学习 

分 类 号:TM721[电气工程—电力系统及自动化]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象