基于自适应强化探索悲观Q的多智能体协同AGC算法  被引量:2

Multi-agent Collaborative AGC Algorithm Based on Self-adaptive Reinforcement-exploration Maxmin Q

在线阅读下载全文

作  者:席磊 全悦[1] 刘治洪 解立辉 XI Lei;QUAN Yue;LIU Zhihong;XIE Lihui(College of Electrical Engineering and New Energy,China Three Gorges University,Yichang 443002,China;Hubei Provincial Key Laboratory for Operation and Control of Cascaded Hydropower Station,China Three Gorges University,Yichang 443002,China)

机构地区:[1]三峡大学电气与新能源学院,宜昌443002 [2]梯级水电站运行与控制湖北省重点实验室(三峡大学),宜昌443002

出  处:《高电压技术》2023年第6期2286-2296,共11页High Voltage Engineering

基  金:国家自然科学基金(52277108)。

摘  要:双碳目标下规模化可再生能源和柔性负荷的接入,使得电力系统中新能源占比日益增大。而传统的控制方法无法充分调动源-网-荷-储各部分的能动性,给电网带来愈来愈差的控制性能。因此本文从自动发电控制的角度,提出一种自适应强化探索悲观Q的多智能体协同算法,以提高源网荷储协同系统的控制性能。算法中所采用的悲观Q学习通过选择多个动作值估计器中最小动作值,不仅能够解决传统Q学习在动作探索过程中动作值的估计偏差,而且能够控制动作值估计偏差从正到负的变化,有助于提高算法的控制精度。同时自适应强化探索策略的引入,代替了传统Q学习中ε-贪婪策略,能够避免重复和不平衡的探索。通过对改进的IEEE标准两区域负荷频率控制模型和源网荷储协同系统模型进行仿真,验证了所提算法的有效性,且与传统强化学习相比,具有更高的CPS性能、更小的频率偏差、更小的区域控制误差和更快的收敛速度。Under the carbon peak and neutrality targets,the proportion of new energy in the power system is increasing.The traditional control method is unable to properly mobilize the initiative of each part of source-gird-load-storage,so that the control performance of power gird is getting worse.Therefore,from the perspective of automatic generation control,this paper proposes a multi-agent collaborative algorithm based on Self-Adaptive Reinforcement-Exploration Maxmin Q in order to improve the control performance of source-gird-load-storage collaborative system.The Maxmin Q-learning adopted in the proposed algorithm mitigates the estimation bias of action-value by using the minimum action-value from multiple action-value estimates.Furthermore,the bias of action-value estimation can be controlled from positive to negative to enhance the control accuracy.Meanwhile,the introduction of Self-Adaptive Reinforcement-Exploration strategy replaces theε-greedy strategy to avoid repetitive and unbalanced exploration.The simulations verify the effectiveness of the algorithm,where the improved IEEE standard two-area load-frequency control model and the model of“Source-Grid-Load-Storage”collaborative system are built.Compared with the traditional reinforcement learnings,the proposed algorithm achieves higher CPS performance,lower frequency deviation,lower area control error,and faster convergence.

关 键 词:悲观Q学习 自动发电控制 源网荷储 强化学习 新能源 多智能体 

分 类 号:TM73[电气工程—电力系统及自动化] TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象