基于改进双延迟深度确定性策略梯度算法的电网有功安全校正控制  被引量:12

Active Power Correction Control of Power Grid Based on Improved Twin Delayed Deep Deterministic Policy Gradient Algorithm

在线阅读下载全文

作  者:顾雪平[1] 刘彤 李少岩[1] 王铁强 杨晓东 Gu Xueping;Liu Tong;Li Shaoyan;Wang Tieqiang;Yang Xiaodong(School of Electrical&Electronic Engineering North China Electric Power University,Baoding 071003 China;State Grid Hebei Electric Power Company,Shijiazhuang 050021 China)

机构地区:[1]华北电力大学电气与电子工程学院,保定071003 [2]国网河北省电力公司,石家庄050021

出  处:《电工技术学报》2023年第8期2162-2177,共16页Transactions of China Electrotechnical Society

基  金:国家电网公司科技资助项目(SGTYHT/17-JS-199)。

摘  要:新型电力系统中,由于源荷不确定性的影响,发生线路过载事故的风险增大,传统的有功安全校正方法无法有效兼顾计算速度及效果等。基于此,该文提出一种基于改进双延迟深度确定性策略梯度算法的电网有功安全校正控制方法。首先,在满足系统静态安全约束条件下,以可调元件出力调整量最小且保证系统整体运行安全性最高为目标,建立有功安全校正控制模型。其次,构建有功安全校正的深度强化学习框架,定义计及目标与约束的奖励函数、反映电力系统运行的观测状态、可改变系统状态的调节动作以及基于改进双延迟深度确定性策略梯度算法的智能体。最后,构造考虑源荷不确定性的历史系统过载场景,借助深度强化学习模型对智能体进行持续交互训练以获得良好的决策效果;并且进行在线应用,计及源荷未来可能的取值,快速得到最优的元件调整方案,消除过载线路。IEEE 39节点系统和IEEE 118节点系统算例结果表明,所提方法能够有效消除电力系统中的线路过载且避免短时间内再次越限,在计算速度、校正效果等方面,与传统方法相比具有明显的优势。With the construction and development of the novel power system,the probability of line overload caused by component faults or source-load fluctuations has been significantly increased.If the system cannot be corrected timely and effectively,the propagation speed and range of cascading faults may be aggravated and lead to a blackout accident.Therefore,the timely and effective implementation of safety correction measures to eliminate power flow over the limit is of great significance to ensure the safe operation of the system.An active power safety correction control method is proposed based on the twin delayed deep deterministic policy gradient algorithm(TD3)algorithm.Firstly,an active power safety correction model is established.One of the objectives is to minimize the sum of the absolute values of the adjustments of the adjustable components,and the other is to ensure the maximum safety of the system.Secondly,a deep reinforcement learning framework for active power safety correction is established,as shown in Fig.A1.State expresses the characteristics of the power system.Action is the output of adjustable components.The reward function comprises the objective function and constraint conditions of the active power safety correction model.The agent selects the TD3 algorithm.Finally,the active power safety correction control is carried out based on the improved TD3 algorithm.The historical overload scenario is constructed to pre-train the active power safety correction model based on the improved TD3 algorithm.Considering the influence of source-load fluctuation on the correction results during the correction process,the possible fluctuation value of the source-load output is calculated for each operating condition.During the online application,the predicted value of source and load in the next 5 minutes plus the prediction error value are used as the output value of new energy and the load value at the current time,which are input into the actor network together with the states of other system components.The impr

关 键 词:新型电力系统 有功安全校正 深度强化学习 改进双延迟深度确定性策略 最优调整方案 

分 类 号:TM732[电气工程—电力系统及自动化]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象