基于均值偏差奖赏函数的放煤口控制策略研究被引量：2

Intelligent control strategy of drawing window in top-coal caving based on mean deviation reward function

作　　者：罗开成高阳杨艺常亚军[1,2] 袁瑞甫 LUO Kai-cheng;GAO Yang;YANG Yi;CHANG Ya-jun;YUAN Rui-fu(Zhengzhou Coal Mining Machinery Group Company Limited,Zhengzhou 450016,China;Zhengzhou Coal Machine Hydraulic Control Group Company Limited,Zhengzhou 450016,China;Collage of Electrical Engineering and Automation,Henan Polytechnic University,Jiaozuo 454000,China;State Collaborative Innovation Center of Coal Work Safety and Clean-efficient Utilization,Jiaozuo 454000,China)

机构地区：[1]郑州煤矿机械集团股份有限公司,河南郑州450016 [2]郑州煤机液压电控有限公司,河南郑州450016 [3]河南理工大学电气工程与自动化学院,河南焦作454000 [4]煤炭安全生产与清洁高效利用省部共建协同创新中心,河南焦作454000

出　　处：《煤炭工程》2022年第9期105-111,共7页Coal Engineering

基　　金：国家重点研发计划项目(2018YFC0604502);河南省煤矿智能开采技术创新中心支撑项目(2021YD01);河南省科技攻关项目(212102210390)。

摘　　要：根据液压支架的空间布局以及放煤口动作过程的特性,将放煤过程抽象为马尔科夫决策过程。同时,以强化学习为框架,在无需样本训练的情况下,利用Q-learning算法在线学习顶煤赋存状态与放煤口动作之间的映射关系,从而实现放煤口动作的最优决策。为保证放煤过程中煤岩分界面均匀下降,在Q-learning算法中设计了一种基于均值偏差的奖赏函数,并在Linux系统中建立了工作面连续进刀放煤三维仿真实验平台,对算法的有效性进行了验证。实验结果表明,基于均值偏差奖赏函数学习到的放煤口控制策略,能够保证在放顶煤过程中煤岩分界面更加均匀地下降。在工作面连续进刀放煤条件下,基于均值偏差奖赏函数Q-learning的智能放煤工艺,放煤平均奖励可达13467.8,比原Q-learning智能放煤工艺提高8.8%,比单轮顺序放煤等传统工艺提高约10%。The actions of the top coal caving is abstracted to a Markov decision process by the spatial layout of the hydraulic supports and the characteristics of the windows action. Meanwhile, the reinforcement learning framework is employed to determine the optimal action of windows in top-coal caving, in which the Q-learning algorithm is adopted to learn the mapping between the state of top coal and the action of the windows online without preparing huge training samples. In the methodology, a new reward function based on mean deviation is designed for Q-learning to maintain the coal-rock boundary settlement uniform during top coal caving. Finally, a three-dimensional simulation experiment platform based on YADE discrete element analysis method is created in the Linux system, and the effectiveness of the proposed methodology is demonstrated by the experiment of cutting the coalface continuously. The results show that the coal-rock boundary driven by the proposed method is flatter during the coal falling, and the average reward of the agent for top coal caving can reach 13467.8. The reward 8.8% higher than the Q-learning method and 10% higher than the single-round sequential coal caving process.

关键词：综合机械化开采放顶煤智能化强化学习

分类号：TD823.[矿业工程—煤矿开采]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于均值偏差奖赏函数的放煤口控制策略研究被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于均值偏差奖赏函数的放煤口控制策略研究 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于均值偏差奖赏函数的放煤口控制策略研究被引量：2