基于鲁棒强化学习的配网潮流优化方法被引量：3

Distribution Network Power Flow Optimization Method Based on Robust Reinforcement Learning

作　　者：李晓旭田猛[1] 朱紫阳董政呈龚立王先培[1] LI Xiaoxu;TIAN Meng;ZHU Ziyang;DONG Zhengcheng;GONG Li;WANG Xianpei(Electronic Information School,Wuhan University,Wuhan 430072,China;School of Automation,Wuhan University of Technology,Wuhan 430072,China)

机构地区：[1]武汉大学电子信息学院,武汉430072 [2]武汉理工大学自动化学院,武汉430072

出　　处：《高电压技术》2023年第6期2329-2338,共10页High Voltage Engineering

基　　金：国家自然科学基金(52177109);湖北省重点研发计划(2020BAB109)。

摘　　要：传统深度强化学习在优化配网潮流时易受传感器观测误差等干扰,鲁棒性较差。对此,提出一种基于鲁棒强化学习的配网潮流优化方法。首先以最小化配网网损为目标,电压、潮流越限为安全约束,建立包含分布式发电、储能及负荷单元的配网潮流优化模型。然后将干扰建模为攻击智能体,对配网潮流优化主智能体的观测状态施加扰动,构建双智能体零和博弈鲁棒强化学习模型。最后提出一种双智能体-拉格朗日乘子-信任区域策略优化算法,配网潮流优化主智能体与攻击智能体同步训练、异步学习,相互对抗博弈。仿真结果表明,通过该方法训练的配网潮流优化智能体,能在不同类型的干扰下做出安全决策,提高了配网潮流优化的鲁棒性和安全性。Traditional deep reinforcement learning is prone to interference such as sensor observation errors when optimizing distribution network power flow,and its robustness is poor.Therefore,a distribution network power flow optimization method based on robust reinforcement learning is proposed.Firstly,a distribution network power flow optimization model including distributed generation,energy storage,and load cells is established with the goal of minimizing distribution network power losses and the security constraints of voltage and power flow exceeding limits.Then,the interference is modeled as an attack agent,which perturbs the observed state of the main agent for distribution network power flow optimization,thus a robust reinforcement learning model for a dual agent zero sum game is constructed.Finally,a dual agent Lagrange multiplier trust region strategy optimization algorithm is proposed,in which the main agent and the attacking agent of distribution network power flow optimization can train synchronously,learn asynchronously,and play games against each other.The simulation results show that the distribution network power flow optimization agent trained by this method can make security decisions under different types of interference,improving the robustness and security of distribution network power flow optimization.

关键词：配网潮流优化鲁棒强化学习零和博弈状态扰动安全决策

分类号：TM744[电气工程—电力系统及自动化] TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于鲁棒强化学习的配网潮流优化方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于鲁棒强化学习的配网潮流优化方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于鲁棒强化学习的配网潮流优化方法被引量：3