基于深度强化学习的节能工艺路线发现方法

Energy-saving process route discovery method based on deep reinforcement learning

作　　者：陶鑫钰王艳纪志成[1,2] TAO Xinyu;WANG Yan;JI Zhicheng(China Key Laboratory of Advanced Process Control for Light Industry Ministry of Education,Jiangnan University,Wuxi 214122,China;School of the Internet of Things Engineering,Jiangnan University,Wuxi 214122,China)

机构地区：[1]江南大学轻工过程先进控制教育部重点实验室,江苏无锡214122 [2]江南大学物联网工程学院,江苏无锡214122

出　　处：《智能系统学报》2023年第1期23-35,共13页CAAI Transactions on Intelligent Systems

基　　金：国家重点研发计划项目(2018YFB1701903)。

摘　　要：由于传统基于固定加工环境的工艺路线制定规则,无法快速响应加工环境的动态变化制定节能工艺路线。因此提出了基于深度Q网络(deep Q network,DQN)的节能工艺路线发现方法。基于马尔可夫决策过程,定义状态向量、动作空间、奖励函数,建立节能工艺路线模型,并将加工环境动态变化的节能工艺路线规划问题,转化为DQN智能体决策问题,利用决策经验的可复用性和可扩展性,进行求解,同时为了提高DQN的收敛速度和解的质量,提出了基于S函数探索机制和加权经验池,并使用了双Q网络。仿真结果表明,相比较改进前,改进后的算法在动态加工环境中能够更快更好地发现节能工艺路线;与遗传算法、模拟退火算法以及粒子群算法相比,改进后的算法不仅能够以最快地速度发现节能工艺路线,而且能得到相同甚至更高精度的解。Due to the traditional process route formulation rules based on the fixed processing environment,it is unable to quickly respond to the dynamic changes of the processing environment to formulate energy-saving process routes.Therefore,an energy-saving process route discovery method based on deep Q network(DQN)is proposed in this paper.Based on the Markov decision process,we define the state vector,action space,and reward function,establish an energy-saving process route model,and transform the energy-saving process route planning problem with dynamic changes in the processing environment into a DQN agent decision-making problem,which uses the reusable and extensible decision-making experience to solve the problem.At the same time,an exploration mechanism based on the S function,a weighted experience pool,and a double-Q network are used to improve the convergence speed and solution quality of DQN.The simulation results show that compared with that before improvement,the improved algorithm can find energy-saving process routes faster and better in the dynamic processing environment;and compared with genetic algorithm,simulated annealing algorithm,as well as particle swarm algorithm,the improved algorithm can not only discover energy-saving process routes at the fastest speed,but also obtain the same or even higher precision solutions.

关键词：深度强化学习深度Q网络动态加工环境工艺路线马尔可夫决策过程智能体决策双Q网络启发式算法

分类号：TP273[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的节能工艺路线发现方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的节能工艺路线发现方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索