检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:罗开成 高阳 杨艺 常亚军[1,2] 袁瑞甫 LUO Kai-cheng;GAO Yang;YANG Yi;CHANG Ya-jun;YUAN Rui-fu(Zhengzhou Coal Mining Machinery Group Company Limited,Zhengzhou 450016,China;Zhengzhou Coal Machine Hydraulic Control Group Company Limited,Zhengzhou 450016,China;Collage of Electrical Engineering and Automation,Henan Polytechnic University,Jiaozuo 454000,China;State Collaborative Innovation Center of Coal Work Safety and Clean-efficient Utilization,Jiaozuo 454000,China)
机构地区:[1]郑州煤矿机械集团股份有限公司,河南郑州450016 [2]郑州煤机液压电控有限公司,河南郑州450016 [3]河南理工大学电气工程与自动化学院,河南焦作454000 [4]煤炭安全生产与清洁高效利用省部共建协同创新中心,河南焦作454000
出 处:《煤炭工程》2022年第9期105-111,共7页Coal Engineering
基 金:国家重点研发计划项目(2018YFC0604502);河南省煤矿智能开采技术创新中心支撑项目(2021YD01);河南省科技攻关项目(212102210390)。
摘 要:根据液压支架的空间布局以及放煤口动作过程的特性,将放煤过程抽象为马尔科夫决策过程。同时,以强化学习为框架,在无需样本训练的情况下,利用Q-learning算法在线学习顶煤赋存状态与放煤口动作之间的映射关系,从而实现放煤口动作的最优决策。为保证放煤过程中煤岩分界面均匀下降,在Q-learning算法中设计了一种基于均值偏差的奖赏函数,并在Linux系统中建立了工作面连续进刀放煤三维仿真实验平台,对算法的有效性进行了验证。实验结果表明,基于均值偏差奖赏函数学习到的放煤口控制策略,能够保证在放顶煤过程中煤岩分界面更加均匀地下降。在工作面连续进刀放煤条件下,基于均值偏差奖赏函数Q-learning的智能放煤工艺,放煤平均奖励可达13467.8,比原Q-learning智能放煤工艺提高8.8%,比单轮顺序放煤等传统工艺提高约10%。The actions of the top coal caving is abstracted to a Markov decision process by the spatial layout of the hydraulic supports and the characteristics of the windows action. Meanwhile, the reinforcement learning framework is employed to determine the optimal action of windows in top-coal caving, in which the Q-learning algorithm is adopted to learn the mapping between the state of top coal and the action of the windows online without preparing huge training samples. In the methodology, a new reward function based on mean deviation is designed for Q-learning to maintain the coal-rock boundary settlement uniform during top coal caving. Finally, a three-dimensional simulation experiment platform based on YADE discrete element analysis method is created in the Linux system, and the effectiveness of the proposed methodology is demonstrated by the experiment of cutting the coalface continuously. The results show that the coal-rock boundary driven by the proposed method is flatter during the coal falling, and the average reward of the agent for top coal caving can reach 13467.8. The reward 8.8% higher than the Q-learning method and 10% higher than the single-round sequential coal caving process.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.254