检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:夏蓓鑫 顾嘉怡 田童 袁杰 彭运芳[1] XIA Beixin;GU Jiayi;TIAN Tong;YUAN Jie;PENG Yunfang(School of Management,Shanghai University,Shanghai 200444,China)
机构地区:[1]上海大学管理学院,上海200444
出 处:《运筹与管理》2024年第6期71-77,共7页Operations Research and Management Science
基 金:国家自然科学基金资助项目(71801147);上海市浦江人才计划项目(22PJC051)。
摘 要:准时高效的物料搬运系统保证了装配制造的持续稳定运行,为动态应对装配线状态变化,有效平衡混流装配的生产效率与能耗,本文提出了基于Q学习算法的强化学习调度模型,对其系统状态、动作策略、报酬函数进行设计,并引入神经网络对Q值函数进行泛化和逼近,改进策略选择机制,形成基于双参数贪婪策略的强化学习动态调度方法。仿真实验结果表明,这种强化学习调度相比其他调度方法,物料搬运调度的优化效果更好,能在保证物料准时运送到装配线,实现最大产量的同时,有效减少搬运距离。The scheduling of the workshop material handling system is an important part of the production control system of the manufacturing enterprise’s flow workshop.Timely and efficient material scheduling can effectively improve production efficiency and economic benefits.In the actual production process,there may be some random events that make the workshop material handling system dynamic.In order to dynamically respond to changes in the state of the assembly line and effectively balance the production efficiency and energy consumption of mixed flow assembly,this paper proposes a reinforcement learning scheduling model based on Q-learning algorithm.The real-time state information of the manufacturing system includes all the state characteristic information of the system at a certain moment.Considering that the complexity of the system is difficult to cover all system states,in order to simplify the model and ensure the accuracy of the decision-making model,and effectively use reinforcement learning to solve it,this paper selects the current real-time information,forward-looking information of the system and the slack time of each part as the system state characteristics used in the scheduling decision model.It sets up five action groups according to the number of transported parts and the transport sequence of multiple parts.The calculation of the transport scheduling plan for each action group of a multi-carrying trolley is divided into three steps:selecting the transport task,calculating the start time,and coordinating the start time point.The reward and punishment function of the system feedback includes three dimensions:out-of-stock time,handling distance,and part-line inventory,which are given different weights according to the optimization goal,in order to realize the multi-objective optimization of minimizing the travel distance of multi-load trolleys and the line-side inventory of each part while satisfying the on-time delivery of parts on the assembly line as much as possible.In order to solve the problem
分 类 号:C935[经济管理—管理学] TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.116.26.90