考虑区域间辅助奖励的配电网电压优化控制

Voltage Optimization Control of Distribution Networks Considering Inter-Regional Auxiliary Rewards

作　　者：周祥李晓露柳劲松林顺富 ZHOU Xiang;LI Xiaolu;LIU Jinsong;LIN Shunfu(College of Electrical Engineering,Shanghai University of Electric Power,Shanghai 200090,China;Electric Power Research Institute,State Grid Shanghai Electric Power Company,Shanghai 200437,China)

机构地区：[1]上海电力大学电气工程学院,上海市200090 [2]国网上海市电力公司电力科学研究院,上海市200437

出　　处：《电力建设》2024年第5期80-93,共14页Electric Power Construction

基　　金：国家自然科学基金项目(51977127)。

摘　　要：智能软开关能够有效解决分布式光伏大规模接入配电网引起的电压波动问题,但会导致区域间协作程度加深,而现阶段使用多智能体深度强化学习算法进行电压优化时,各智能体仅使用各自区域内的奖励进行训练,导致智能体缺乏协同,输出策略难以保证最优性。为此提出考虑区域间辅助奖励的配电网电压优化方法,首先建立基于多智能体深度强化学习的多时间尺度电压优化框架,其次针对控制智能软开关的智能体,将各自区域内奖励定义为主奖励,邻近区域内奖励定义为辅助奖励,然后通过主、辅助奖励损失函数关于网络参数梯度的数量积分析辅助奖励对训练的有利程度,并采用演化博弈方法自适应修改辅助奖励参与因子;最后,在改进的IEEE 33节点系统验证了所提方法能够稳定智能体训练过程,提升智能体策略的优化效果。A soft open point can effectively solve the voltage fluctuation problem caused by the large-scale integration of distributed photovoltaics into a power distribution network.However,this can lead to increased collaboration between regions.Currently,when using multi-agent deep reinforcement learning algorithms for voltage optimization,each agent uses only rewards within its own region for training,resulting in a lack of coordination among agents and difficulty in guaranteeing the optimality of the output strategies.To address this problem,a method for voltage optimization in distribution networks that considers inter-regional auxiliary rewards was proposed.First,a multi-agent deep reinforcement learning framework based on multiple timescales was established for voltage optimization.Second,for agents controlling the soft open points,the rewards within their respective regions were defined as primary rewards,whereas the rewards from neighboring regions are defined as auxiliary rewards.The beneficial effect of auxiliary rewards on training was analyzed using the dot product of the primary and auxiliary reward loss functions with respect to the network parameter gradients.An adaptive modification of the auxiliary reward participation factor is implemented using an evolutionary game approach.Finally,the proposed method is validated in an improved IEEE 33 node system,which demonstrates stable training processes and improves strategy optimization for the agents.

关键词：多智能体深度强化学习电压优化辅助奖励演化博弈参与因子

分类号：TM715[电气工程—电力系统及自动化]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

考虑区域间辅助奖励的配电网电压优化控制

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

考虑区域间辅助奖励的配电网电压优化控制

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索