检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邓绍斌 朱军[1,2,3] 周晓锋[1,2,3] 李帅[1,2,3,4] 刘舒锐[1,2,3] DENG Shaobin;ZHU Jun;ZHOU Xiaofeng;LI Shuai;LIU Shurui(Key Laboratory of Networked Control System,Chinese Academy of Sciences,Shenyang Liaoning 110016,China;Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang Liaoning 110169,China;Institutes for Robotics and Intelligent Manufacturing Innovation,Chinese Academy of Sciences,Shenyang Liaoning 110169,China;University of Chinese Academy of Sciences,Beijing 100049,China)
机构地区:[1]中国科学院网络化控制系统重点实验室,沈阳110016 [2]中国科学院沈阳自动化研究所,沈阳110169 [3]中国科学院机器人与智能制造创新研究院,沈阳110169 [4]中国科学院大学,北京100049
出 处:《计算机应用》2022年第5期1642-1648,共7页journal of Computer Applications
基 金:辽宁省“兴辽英才计划”项目(XLYC1808009)。
摘 要:为了实现对非线性、滞后性和强耦合的工业过程稳定精确的控制,提出了一种基于局部策略交互探索的深度确定性策略梯度(LPIE-DDPG)的控制方法用于深度强化学习的连续控制。首先,使用深度确定性策略梯度(DDPG)算法作为控制策略,从而极大地减小控制过程中的超调和振荡现象;同时,使用原控制器的控制策略作为局部策略进行搜索,并以交互探索规则进行学习,提高了学习效率和学习稳定性;最后,在Gym框架下搭建青霉素发酵过程仿真平台并进行实验。仿真结果表明,相较于DDPG,LPIE-DDPG在收敛效率上提升了27.3%;相较于比例-积分-微分(PID),LPIE-DDPG在温度控制效果上有更少的超调和振荡现象,在产量上青霉素浓度提高了3.8%。可见所提方法能有效提升训练效率,同时提高工业过程控制的稳定性。In order to achieve the stable and precise control of industrial processes with non-linearity,hysteresis,and strong coupling,a new control method based on Local Policy Interaction Exploration-based Deep Deterministic Policy Gradient(LPIE-DDPG)was proposed for the continuous control of deep reinforcement learning.Firstly,the Deep Deterministic Policy Gradient(DDPG)algorithm was used as the control strategy to greatly reduce the phenomena of overshoot and oscillation in the control process.At the same time,the control strategy of original controller was used as the local strategy for searching,and interactive exploration was used as the rule for learning,thereby improving the learning efficiency and stability.Finally,a penicillin fermentation process simulation platform was built under the framework of Gym and the experiments were carried out.Simulation results show that,compared with DDPG,the proposed LPIE-DDPG improves the convergence efficiency by 27.3%;compared with Proportion-Integration-Differentiation(PID),the proposed LPIE-DDPG has fewer overshoot and oscillation phenomena on temperature control effect,and has the penicillin concentration increased by 3.8%in yield.In conclusion,the proposed method can effectively improve the training efficiency and improve the stability of industrial process control.
关 键 词:工业过程控制 深度强化学习 深度确定性策略梯度 局部策略交互探索 青霉素发酵过程
分 类 号:TP273.2[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.118.122.147