检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周祥 李晓露 柳劲松 林顺富 ZHOU Xiang;LI Xiaolu;LIU Jinsong;LIN Shunfu(College of Electrical Engineering,Shanghai University of Electric Power,Shanghai 200090,China;Electric Power Research Institute,State Grid Shanghai Electric Power Company,Shanghai 200437,China)
机构地区:[1]上海电力大学电气工程学院,上海市200090 [2]国网上海市电力公司电力科学研究院,上海市200437
出 处:《电力建设》2024年第5期80-93,共14页Electric Power Construction
基 金:国家自然科学基金项目(51977127)。
摘 要:智能软开关能够有效解决分布式光伏大规模接入配电网引起的电压波动问题,但会导致区域间协作程度加深,而现阶段使用多智能体深度强化学习算法进行电压优化时,各智能体仅使用各自区域内的奖励进行训练,导致智能体缺乏协同,输出策略难以保证最优性。为此提出考虑区域间辅助奖励的配电网电压优化方法,首先建立基于多智能体深度强化学习的多时间尺度电压优化框架,其次针对控制智能软开关的智能体,将各自区域内奖励定义为主奖励,邻近区域内奖励定义为辅助奖励,然后通过主、辅助奖励损失函数关于网络参数梯度的数量积分析辅助奖励对训练的有利程度,并采用演化博弈方法自适应修改辅助奖励参与因子;最后,在改进的IEEE 33节点系统验证了所提方法能够稳定智能体训练过程,提升智能体策略的优化效果。A soft open point can effectively solve the voltage fluctuation problem caused by the large-scale integration of distributed photovoltaics into a power distribution network.However,this can lead to increased collaboration between regions.Currently,when using multi-agent deep reinforcement learning algorithms for voltage optimization,each agent uses only rewards within its own region for training,resulting in a lack of coordination among agents and difficulty in guaranteeing the optimality of the output strategies.To address this problem,a method for voltage optimization in distribution networks that considers inter-regional auxiliary rewards was proposed.First,a multi-agent deep reinforcement learning framework based on multiple timescales was established for voltage optimization.Second,for agents controlling the soft open points,the rewards within their respective regions were defined as primary rewards,whereas the rewards from neighboring regions are defined as auxiliary rewards.The beneficial effect of auxiliary rewards on training was analyzed using the dot product of the primary and auxiliary reward loss functions with respect to the network parameter gradients.An adaptive modification of the auxiliary reward participation factor is implemented using an evolutionary game approach.Finally,the proposed method is validated in an improved IEEE 33 node system,which demonstrates stable training processes and improves strategy optimization for the agents.
关 键 词:多智能体深度强化学习 电压优化 辅助奖励 演化博弈 参与因子
分 类 号:TM715[电气工程—电力系统及自动化]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49