基于博弈强化学习的电网故障序列搜索及防御策略研究  被引量:7

Research on Searching of Fault Sequence and Defense Strategy in Power Grid Based on Game Reinforcement Learning

在线阅读下载全文

作  者:邓祥力 王伟 刘世明[3] DENG Xiangli;WANG Wei;LIU Shiming(School of Electric Power Engineering,Shanghai University of Electric Power,Yangpu District,Shanghai 200090,China;Key Laboratory of Control of Power Transmission and Conversion(Shanghai Jiao Tong University),Ministry of Education,Minhang District,Shanghai 200240,China;Key Laboratory of Power System Intelligent Dispatch and Control(Shandong University),Jinan 250061,Shandong Province,China)

机构地区:[1]上海电力大学电气工程学院,上海市杨浦区200090 [2]电力传输与功率变换控制教育部重点实验室(上海交通大学),上海市闵行区200240 [3]电网智能化调度与控制教育部重点实验室(山东大学),山东省济南市250061

出  处:《电网技术》2021年第12期4856-4867,共12页Power System Technology

基  金:国家自然科学基金项目(5177070074);电力传输与功率变换控制教育部重点实验室开放课题(2020AB01)。

摘  要:针对连锁故障相继跳闸序列求取过程中未考虑各阶段支路跳闸调度策略对电网的影响,从而造成跳闸序列预测结果和实际不相符的问题,文章提出充分考虑支路跳闸和调度策略相互作用的连续跳闸序列求取的博弈论计算模型,用于跳闸序列和最优调度策略的求取。以断路器失灵预想故障集为初始条件,从攻击方(电网故障)和防御方(调度端)的角度描述连锁故障发展过程。从电网攻击方角度,主要考虑因潮流转移造成支路连续退运,从调度端防御的角度,利用灵敏度调整和距离保护Ⅲ段动作特性自适应调整作为防御控制策略,建立二人多阶段动态零和博弈模型。在电网初始断路器故障条件下,以后续所有可能的支路退运实例为数据样本,建立价值函数,利用Q-learning强化学习寻求二人博弈的纳什均衡,获得最优连锁故障相继跳闸序列,同时获得最优调度防御策略,用于电网的在线防御。以IEEE 39节点系统为例进行了算例分析,仿真结果验证了文章用于调度在线防御策略的二人多阶段动态零和博弈模型的正确性。In the process of obtaining the sequence of cascading failures,the influence of the branch trip scheduling strategy on power grid at each stage was not considered,which caused the problem that the prediction results of the trip sequence do not match the actual situation.This paper proposes a game theory calculation model that fully considers the interaction of branch trips and dispatching strategies to obtain the continuous trip sequence,which is used to obtain the trip sequence and optimal dispatching strategy.Taking the set of expected failures of circuit breaker failure as the initial conditions,the development process of cascading failures is described from the perspective of the attacker(grid fault)and the defender(dispatcher).From the perspective of the power grid attacker,the main consideration is the continuous outage of branch due to the transfer of power flow.From the perspective of dispatcher defense,action characteristics adaptive adjustment of zone III distance protection and sensitivity adjustment are taken as defense control strategies.In this way,a two-player multi-stage dynamic zero-sum game model is established.The Q-learning reinforcement learning is used to find the Nash composed of possible subsequent branch tripping instances equilibrium of the two-person game with the samples data composed of possible subsequent branch tripping instances under the condition of the initial circuit breaker failure in the power grid,the optimal cascading failure sequence of successive trips will get through this way,the optimal dispatching defense strategy is obtained,which can be applied to the online defense for the power grid.The simulation is taken on based on IEEE39 system,the results verify the correctness of the two-person multi-stage dynamic zero-sum game model used for scheduling online defense strategies in this paper.

关 键 词:断路器失灵 连锁故障 Q-learning算法 多阶段动态零和博弈 

分 类 号:TM721[电气工程—电力系统及自动化]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象