基于深度强化学习的暂态稳定紧急控制决策方法被引量：8

Decision-making Method for Transient Stability Emergency Control Based on Deep Reinforcement Learning

作　　者：李宏浩张沛[1] 刘曌 LI Honghao;ZHANG Pei;LIU Zhao(School of Electrical Engineering,Beijing Jiaotong University,Beijing 100044,China)

机构地区：[1]北京交通大学电气工程学院,北京市100044

出　　处：《电力系统自动化》2023年第5期144-152,共9页Automation of Electric Power Systems

基　　金：中央高校基本科研业务费专项资金资助项目(2021JBM027);国家自然科学基金青年基金资助项目(52107068)。

摘　　要：随着广域测量系统在暂态稳定控制中的应用,广域信息的随机性时滞造成了系统受控时状态的不确定性,并且切机和切负荷控制的离散决策变量维度极高,电网在线紧急控制决策面临着挑战。为此,将暂态稳定紧急控制问题建模为马尔可夫决策问题,提出一种深度Q网络(DQN)强化学习与暂态能量函数相结合的紧急控制决策方法,多步序贯决策过程中可应对紧急控制的时滞不确定性影响。奖励函数以考虑控制目标和约束条件的短期奖励函数和考虑稳定性的长期奖励函数构成,并在奖励函数中引入暂态能量函数的势能指数来提高学习效率。以最大化累计奖励为目标,通过DQN算法在离散化动作空间中学习得到最优紧急控制策略,解决暂态稳定紧急控制问题。所提方法通过IEEE 39节点系统验证了模型在紧急控制决策中的有效性。With the application of wide-area measurement systems in the transient stability control,the random time delay of widearea information during the control process may cause the uncertainty of power system state during control.Moreover,the dimension of discrete decision variables for machine tripping and load shedding is extremely high,and the online emergency control decision-making of the power grid is facing challenge.Therefore,the transient stability emergency control problem is modeled as a Markov decision problem,and an decision-making method combining the deep Q-learning network(DQN)reinforcement learning and transient energy function is proposed,which can deal with the time-delay uncertainty of emergency control through the multistep sequential decision-making process.The reward function is composed of a short-term reward function considering the control objectives and constraints,and a long-term reward function considering the stability.The potential energy index of the transient energy function is introduced into the reward function to improve the learning efficiency.With the objective of maximizing the cumulative rewards,the optimal emergency control strategy is learned in the discrete action space by DQN algorithm to solve the transient stability emergency control problem.The effectiveness of the proposed method in the emergency control decision-making is verified by an IEEE 39-bus system.

关键词：深度强化学习暂态稳定紧急控制决策暂态能量函数深度Q网络(DQN)算法时滞

分类号：TM712[电气工程—电力系统及自动化] TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的暂态稳定紧急控制决策方法被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的暂态稳定紧急控制决策方法 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于深度强化学习的暂态稳定紧急控制决策方法被引量：8