基于改进强化学习的准时化物料搬运系统实时调度方法

Real-time Scheduling Method Based on ReinforcementLearning for Material Handling in Assembly Lines

作　　者：夏蓓鑫顾嘉怡田童袁杰彭运芳[1] XIA Beixin;GU Jiayi;TIAN Tong;YUAN Jie;PENG Yunfang(School of Management,Shanghai University,Shanghai 200444,China)

机构地区：[1]上海大学管理学院,上海200444

出　　处：《运筹与管理》2024年第6期71-77,共7页Operations Research and Management Science

基　　金：国家自然科学基金资助项目(71801147);上海市浦江人才计划项目(22PJC051)。

摘　　要：准时高效的物料搬运系统保证了装配制造的持续稳定运行,为动态应对装配线状态变化,有效平衡混流装配的生产效率与能耗,本文提出了基于Q学习算法的强化学习调度模型,对其系统状态、动作策略、报酬函数进行设计,并引入神经网络对Q值函数进行泛化和逼近,改进策略选择机制,形成基于双参数贪婪策略的强化学习动态调度方法。仿真实验结果表明,这种强化学习调度相比其他调度方法,物料搬运调度的优化效果更好,能在保证物料准时运送到装配线,实现最大产量的同时,有效减少搬运距离。The scheduling of the workshop material handling system is an important part of the production control system of the manufacturing enterprise’s flow workshop.Timely and efficient material scheduling can effectively improve production efficiency and economic benefits.In the actual production process,there may be some random events that make the workshop material handling system dynamic.In order to dynamically respond to changes in the state of the assembly line and effectively balance the production efficiency and energy consumption of mixed flow assembly,this paper proposes a reinforcement learning scheduling model based on Q-learning algorithm.The real-time state information of the manufacturing system includes all the state characteristic information of the system at a certain moment.Considering that the complexity of the system is difficult to cover all system states,in order to simplify the model and ensure the accuracy of the decision-making model,and effectively use reinforcement learning to solve it,this paper selects the current real-time information,forward-looking information of the system and the slack time of each part as the system state characteristics used in the scheduling decision model.It sets up five action groups according to the number of transported parts and the transport sequence of multiple parts.The calculation of the transport scheduling plan for each action group of a multi-carrying trolley is divided into three steps:selecting the transport task,calculating the start time,and coordinating the start time point.The reward and punishment function of the system feedback includes three dimensions:out-of-stock time,handling distance,and part-line inventory,which are given different weights according to the optimization goal,in order to realize the multi-objective optimization of minimizing the travel distance of multi-load trolleys and the line-side inventory of each part while satisfying the on-time delivery of parts on the assembly line as much as possible.In order to solve the problem

关键词：车间物料搬运系统强化学习 Q学习混合策略

分类号：C935[经济管理—管理学] TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进强化学习的准时化物料搬运系统实时调度方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进强化学习的准时化物料搬运系统实时调度方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索