基于鲁棒强化学习的配网潮流优化方法  被引量:3

Distribution Network Power Flow Optimization Method Based on Robust Reinforcement Learning

在线阅读下载全文

作  者:李晓旭 田猛[1] 朱紫阳 董政呈 龚立 王先培[1] LI Xiaoxu;TIAN Meng;ZHU Ziyang;DONG Zhengcheng;GONG Li;WANG Xianpei(Electronic Information School,Wuhan University,Wuhan 430072,China;School of Automation,Wuhan University of Technology,Wuhan 430072,China)

机构地区:[1]武汉大学电子信息学院,武汉430072 [2]武汉理工大学自动化学院,武汉430072

出  处:《高电压技术》2023年第6期2329-2338,共10页High Voltage Engineering

基  金:国家自然科学基金(52177109);湖北省重点研发计划(2020BAB109)。

摘  要:传统深度强化学习在优化配网潮流时易受传感器观测误差等干扰,鲁棒性较差。对此,提出一种基于鲁棒强化学习的配网潮流优化方法。首先以最小化配网网损为目标,电压、潮流越限为安全约束,建立包含分布式发电、储能及负荷单元的配网潮流优化模型。然后将干扰建模为攻击智能体,对配网潮流优化主智能体的观测状态施加扰动,构建双智能体零和博弈鲁棒强化学习模型。最后提出一种双智能体-拉格朗日乘子-信任区域策略优化算法,配网潮流优化主智能体与攻击智能体同步训练、异步学习,相互对抗博弈。仿真结果表明,通过该方法训练的配网潮流优化智能体,能在不同类型的干扰下做出安全决策,提高了配网潮流优化的鲁棒性和安全性。Traditional deep reinforcement learning is prone to interference such as sensor observation errors when optimizing distribution network power flow,and its robustness is poor.Therefore,a distribution network power flow optimization method based on robust reinforcement learning is proposed.Firstly,a distribution network power flow optimization model including distributed generation,energy storage,and load cells is established with the goal of minimizing distribution network power losses and the security constraints of voltage and power flow exceeding limits.Then,the interference is modeled as an attack agent,which perturbs the observed state of the main agent for distribution network power flow optimization,thus a robust reinforcement learning model for a dual agent zero sum game is constructed.Finally,a dual agent Lagrange multiplier trust region strategy optimization algorithm is proposed,in which the main agent and the attacking agent of distribution network power flow optimization can train synchronously,learn asynchronously,and play games against each other.The simulation results show that the distribution network power flow optimization agent trained by this method can make security decisions under different types of interference,improving the robustness and security of distribution network power flow optimization.

关 键 词:配网潮流优化 鲁棒强化学习 零和博弈 状态扰动 安全决策 

分 类 号:TM744[电气工程—电力系统及自动化] TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象