基于深度强化学习的机组组合优化方法研究  被引量:6

Research on Unit Commitment Optimization Method Based on Deep Reinforcement Learning

在线阅读下载全文

作  者:陈准 潘毅 范士雄 许丹 丁强 蔡帜 CHEN Zhun;PAN Yi;FAN Shixiong;XU Dan;DING Qiang;CAI Zhi(Beijing Key Laboratory of Research and System Evaluation of Power Dispatching Automation Technology(China Electric Power Research Institute Co.,Ltd.),Haidian District,Beijing 100192,China)

机构地区:[1]电力调度自动化技术研究与系统评价北京市重点实验室(中国电力科学研究院有限公司),北京市海淀区100192

出  处:《电力信息与通信技术》2023年第3期33-40,共8页Electric Power Information and Communication Technology

基  金:国家电网有限公司总部科技项目资助“面向碳达峰、碳中和目标的一二次能源综合平衡分析决策技术研究”(5100-202155294A-0-0-00)。

摘  要:针对电网大规模机组组合优化问题,文章提出一种将指针网络与强化学习相融合的深度强化学习方法。首先,充分考虑电力系统以及火电机组的各类约束条件的限制,建立以发电成本最小为目标函数的机组组合强化学习环境;其次,在优化计算方面,提出一种将指针网络和Actor-Critic模型相结合的深度强化学习方法,形成从预测数据到机组开停方式的快速映射,从而达到快速求解机组组合问题的目的。采用10/200机24时段进行算例验证,结果表明,相较于使用传统数学规划方法的计算结果,所提方法能够更加快速地得到机组组合结果,通过应用指针网络作为强化学习模型的策略网络,能够加强网络提取特征的能力,提升计算结果的准确性。Aiming at the optimization problem of large-scale unit commitment in power grid,a deep reinforcement learning method combining pointer network and reinforcement learning is proposed.Firstly,the constraints of power system and thermal power units are fully considered,and the reinforcement learning environment of unit commitment with the minimum generation cost as the objective function is established;Secondly,in terms of optimization calculation,a deep reinforcement learning method combining pointer network and actor critical model is proposed,which forms a fast mapping from prediction data to unit startup and shutdown mode,so as to achieve the purpose of quickly solving unit commitment problems.The results for systems up to 10/200 units and 24 hours show that compared with the calculation results of traditional mathematical programming method,the method proposed in this paper can get the unit commitment results more quickly.By using pointer network as the policy network of reinforcement learning model,the ability of network feature extraction can be strengthened and the accuracy of calculation results can be improved.

关 键 词:机组组合 深度强化学习 指针网络 Actor-Critic模型 

分 类 号:TN915.853[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象