检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李金娜 袁林 丁进良 LI Jin-Na;YUAN Lin;DING Jin-Liang(School of Information and Control Engineering,Liaoning Petrochemical University,Fushun 113000;State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819)
机构地区:[1]辽宁石油化工大学信息与控制工程学院,抚顺113000 [2]东北大学流程工业综合自动化国家重点实验室,沈阳110819
出 处:《自动化学报》2023年第2期461-472,共12页Acta Automatica Sinica
基 金:国家重点研发计划项目(2018YFB1701104);国家自然科学基金(62073158,61673280,61525302,61833004);辽宁省兴辽计划(XLYC1808001);辽宁省科技计划项目(2020JH2/10500001);辽宁省自然基金重点领域联合开放基金(2019-KF-03-06);辽宁省教育厅基本科研项目(LJKZ0401)资助。
摘 要:运行指标决策问题是实现工业过程运行安全和生产指标优化的关键.考虑到多运行指标决策问题求解的复杂性和工业过程生产条件动态波动引发生产指标状态的不确定性,提出了一种策略异步更新强化学习算法自学习决策运行指标,并给出算法收敛性的理论证明.该算法在随机自适应动态规划框架下,利用样本均值代替计算生产指标状态转移概率矩阵,因此无需要求生产指标状态转移概率矩阵已知.并且通过引入时钟和定义其阈值,采用集中式策略评估、多策略异步更新方式用以简化求解多运行指标决策问题,提高强化学习的学习效率.利用可测量数据,自学习得到的运行指标能够保证生产指标优化,并且限制在规定范围之内.最后,采用中国西部某大型选矿厂的实际数据进行仿真验证,表明该方法的有效性.The decision-making operational index has been a key issue for achieving safe and optimal operation of industrial processes.Considering the complexity of decision making of multiple operational indices and the uncertainty of production indices caused by changes of working condition in industrial processes,this paper proposes a reinforcement learning algorithm with policy asynchronous updating for the first time aiming at self-learning operational indices,followed by the theoretical proof of convergence of the proposed algorithm.To this end,under the framework of stochastic adaptive dynamic programming,the sample mean is utilized rather than calculating the state transition probability matrix of production indices,with the outcome that the state transition probability matrix of production indices is not required to be known a priori.Distinctly from traditional synchronized policy updating,the centralized policy evaluation and asynchronous updating of multiple policies are implemented in the proposed algorithm based on the introduction of a time clock with its threshold,such that solving the concerned decision-making problem of multiple operational indices becomes easier and the learning efficiency of reinforcement learning is improved.Thus,the self-learned operational indices using measured data can ensure the optimality of production indices and limit them within the prescribed range.Experiments are conducted using the real date collected from a large-scale mineral processing plant in west China in order to illustrate the effectiveness of the approach.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49