基于单/多智能体简化强化学习的电力系统无功电压控制被引量：6

Single/Multi Agent Simplified Deep Reinforcement Learning Based Volt-Var Control of Power System

作　　者：马庆[1] 邓长虹[1] Ma Qing;Deng Changhong(School of Electrical Engineering and Automation Wuhan University,Wuhan 430072 China)

机构地区：[1]武汉大学电气与自动化学院,武汉430072

出　　处：《电工技术学报》2024年第5期1300-1312,共13页Transactions of China Electrotechnical Society

基　　金：国家重点研发计划资助项目(2017YFB0903705)。

摘　　要：为了快速平抑分布式能源接入系统产生的无功电压波动,以强化学习、模仿学习为代表的机器学习方法逐渐被应用于无功电压控制。虽然现有方法能实现在线极速求解,但仍然存在离线训练速度慢、普适性不够等阻碍其应用于实际的缺陷。该文首先提出一种适用于输电网集中式控制的单智能体简化强化学习方法,该方法基于“Actor-Critic”架构对强化学习进行简化与改进,保留了强化学习无需标签数据与强普适性的优点,同时消除了训练初期因智能体随机搜索造成的计算浪费,大幅提升了强化学习的训练速度;然后,提出一种适用于配电网分布式零通信控制的多智能体简化强化学习方法,该方法将简化强化学习思想推广形成多智能体版本,同时采用模仿学习进行初始化,将全局优化思想提前注入各智能体,提升各无功设备之间的就地协同控制效果;最后,基于改进IEEE 118节点算例的仿真结果验证了所提方法的正确性与快速性。In order to quickly suppress the rapid fluctuations of reactive power and voltage caused by the random output change of distributed energies,machine learning(ML)methods represented by deep reinforcement learning(DRL)and imitation learning(IL)have been applied to volt-var control(VVC)research recently,to replace the traditional methods which require a large number of iterations.Although the ML methods in the existing literature can realize the online rapid VVC optimization,there are still some shortcomings such as slow offline training speed and insufficient universality that hinder their applications in practice.Firstly,this paper proposes a single-agent simplified DRL(SASDRL)method suitable for the centralized control of transmission networks.Based on the classic"Actor-Critic"architecture and the fact that the Actor network can generate wonderful control strategies heavily depends on whether the Critic network can make accurate evaluation,this method simplifies and improves the offline training process of DRL based VVC,whose core ideas are the simplification of Critic network training and the change in the update mode of Actor and Critic network.It simplifies the sequential decision problem set in the traditional DRL based VVC to a single point decision problem and the output of Critic network is transformed from the original sequential action value into the reward value corresponding to the current control strategy.In addition,by training the Critic network in advance to help the accelerated convergence of Actor network,it solves the computational waste problem caused by the random search of agent in the early training stage which greatly improves the offline training speed,and retains the DRL’s advantages like without using massive labeled data and strong universality.Secondly,a multi-agent simplified DRL method(MASDRL)suitable for decentralized and zero-communication control of active distribution network is proposed.This method generalizes the core idea of SASDRL to form a multi-agent version and continues

关键词：无功电压控制集中式控制单智能体简化强化学习分布式控制多智能体简化强化学习

分类号：TM76[电气工程—电力系统及自动化]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于单/多智能体简化强化学习的电力系统无功电压控制被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于单/多智能体简化强化学习的电力系统无功电压控制 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于单/多智能体简化强化学习的电力系统无功电压控制被引量：6