基于二阶价值梯度模型强化学习的工业过程控制方法  被引量:1

Industrial process control method based on second-ordervalue gradient model reinforcement learning

在线阅读下载全文

作  者:张博[1,2,3,4] 潘福成[1,2,3] 周晓锋 李帅 Zhang Bo;Pan Fucheng;Zhou Xiaofeng;Li Shuai(Key Laboratory of Networked Control Systems,Chinese Academy of Sciences,Shenyang 110016,China;Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China;Institutes for Robotics Intelligent Manufacturing,Chinese Academy of Sciences,Shenyang 110169,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院网络化控制系统重点实验室,沈阳110016 [2]中国科学院沈阳自动化研究所,沈阳110016 [3]中国科学院机器人与智能制造创新研究院,沈阳110169 [4]中国科学院大学,北京100049

出  处:《计算机应用研究》2024年第8期2434-2440,共7页Application Research of Computers

基  金:中国科学院沈阳自动化研究所基础研究计划资助项目(2022000346)。

摘  要:为了实现对高延时、非线性和强耦合的复杂工业过程稳定准确的连续控制,提出了一种基于二阶价值梯度模型强化学习的控制方法。首先,该方法在模型训练过程中加入了状态价值函数的二阶梯度信息,具备更精确的函数逼近能力和更高的鲁棒性,学习迭代效率更高;其次,通过采用新的状态采样策略,可以更高效地利用模型进行策略学习。最后,通过在OpenAI的Gym公共实验环境和两个工业场景的仿真环境的实验表明:基于二阶价值梯度模型对比传统的基于最大似然估计模型,环境模型预测误差显著降低;基于二阶价值梯度模型的强化学习方法学习效率优于现有的基于模型的策略优化方法,具备更好的控制性能,并减小了控制过程中的振荡现象。可见该方法能有效地提升训练效率,同时提高工业过程控制的稳定性和准确性。To achieve stable and accurate control of complex industrial processes with high latency,nonlinearity,and strong coupling,this paper proposed a control method based on second-order value function gradient model reinforcement learning.Firstly,during the model training process,the method incorporated second-order gradient information of the state-value function,enabling more accurate function approximation and higher robustness,resulting in improving learning iteration efficiency.Secondly,by adopting a new state sampling strategy,this method facilitated more effective utilization of the model for policy learning.Lastly,experiments conducted in the OpenAI Gym public environments and simulated environments of two industrial scenarios demonstrate that compared to traditional maximum likelihood estimation models,the second-order value gradient model significantly reduces the prediction error of the environment model.In addition,the reinforcement learning method based on the second-order value gradient model exhibits higher learning efficiency than existing model-based policy optimization methods,showcasing better control performance and mitigating oscillation phenomena during the control process.In conclusion,the proposed method effectively enhances training efficiency while improving the stability and accuracy of industrial process control.

关 键 词:工业过程控制 模型强化学习 二阶价值梯度 状态价值函数 状态采样策略 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象