不确定工业过程运行指标异步更新强化学习决策算法

Asynchronous Updating Reinforcement Learning Algorithm for Decision-making Operational Indices of Uncertain Industrial Processes

作　　者：李金娜袁林丁进良 LI Jin-Na;YUAN Lin;DING Jin-Liang(School of Information and Control Engineering,Liaoning Petrochemical University,Fushun 113000;State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819)

机构地区：[1]辽宁石油化工大学信息与控制工程学院,抚顺113000 [2]东北大学流程工业综合自动化国家重点实验室,沈阳110819

出　　处：《自动化学报》2023年第2期461-472,共12页Acta Automatica Sinica

基　　金：国家重点研发计划项目(2018YFB1701104);国家自然科学基金(62073158,61673280,61525302,61833004);辽宁省兴辽计划(XLYC1808001);辽宁省科技计划项目(2020JH2/10500001);辽宁省自然基金重点领域联合开放基金(2019-KF-03-06);辽宁省教育厅基本科研项目(LJKZ0401)资助。

摘　　要：运行指标决策问题是实现工业过程运行安全和生产指标优化的关键.考虑到多运行指标决策问题求解的复杂性和工业过程生产条件动态波动引发生产指标状态的不确定性,提出了一种策略异步更新强化学习算法自学习决策运行指标,并给出算法收敛性的理论证明.该算法在随机自适应动态规划框架下,利用样本均值代替计算生产指标状态转移概率矩阵,因此无需要求生产指标状态转移概率矩阵已知.并且通过引入时钟和定义其阈值,采用集中式策略评估、多策略异步更新方式用以简化求解多运行指标决策问题,提高强化学习的学习效率.利用可测量数据,自学习得到的运行指标能够保证生产指标优化,并且限制在规定范围之内.最后,采用中国西部某大型选矿厂的实际数据进行仿真验证,表明该方法的有效性.The decision-making operational index has been a key issue for achieving safe and optimal operation of industrial processes.Considering the complexity of decision making of multiple operational indices and the uncertainty of production indices caused by changes of working condition in industrial processes,this paper proposes a reinforcement learning algorithm with policy asynchronous updating for the first time aiming at self-learning operational indices,followed by the theoretical proof of convergence of the proposed algorithm.To this end,under the framework of stochastic adaptive dynamic programming,the sample mean is utilized rather than calculating the state transition probability matrix of production indices,with the outcome that the state transition probability matrix of production indices is not required to be known a priori.Distinctly from traditional synchronized policy updating,the centralized policy evaluation and asynchronous updating of multiple policies are implemented in the proposed algorithm based on the introduction of a time clock with its threshold,such that solving the concerned decision-making problem of multiple operational indices becomes easier and the learning efficiency of reinforcement learning is improved.Thus,the self-learned operational indices using measured data can ensure the optimality of production indices and limit them within the prescribed range.Experiments are conducted using the real date collected from a large-scale mineral processing plant in west China in order to illustrate the effectiveness of the approach.

关键词：运行优化控制强化学习数据驱动控制自适应动态规划安全运行

分类号：TP18[自动化与计算机技术—控制理论与控制工程] TB497[自动化与计算机技术—控制科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

不确定工业过程运行指标异步更新强化学习决策算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

不确定工业过程运行指标异步更新强化学习决策算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索