检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王凌[1] 潘子肖 WANG Ling;PAN Zi-xiao(Department of Automation,Tsinghua University,Beijing 100084,China)
机构地区:[1]清华大学自动化系,北京100084
出 处:《控制与决策》2021年第11期2609-2617,共9页Control and Decision
基 金:国家杰出青年科学基金项目(61525304);国家自然科学基金项目(61873328).
摘 要:流水车间调度是应用背景最为广泛的调度问题,其智能算法研究具有重要的学术意义和应用价值.以最小化最大完工时间为目标,提出求解流水车间调度的一种基于深度强化学习与迭代贪婪算法的框架.首先,设计一种新的编码网络对问题进行建模,解决了传统模型受问题规模影响而难以扩展的缺陷,并利用强化学习训练模型以获取优良输出结果;然后,提出一种带反馈机制的迭代贪婪算法,以网络的输出结果为初始解,协同利用多种局部操作提高搜索能力,并根据性能反馈调节各操作的使用,进而获得最终的调度解.仿真结果和统计对比表明,所提出的深度强化学习与迭代贪婪融合的算法能够取得更好的性能.As the scheduling problem with wide application backgrounds,the research of intelligent algorithms for flow-shop scheduling is of important academic significance and application value.With the criterion of minimizing the maximum completion time,a framework is proposed based on deep reinforcement learning and the iterative greedy method for solving the permutation flow-shop scheduling.Firstly,a new encoding network is designed to model the problem to avoid the defect in generalizing the classic model affected by problem scale,and the reinforcement learning is used to train the model to yield good output result.Then,an iterative greedy algorithm with feedback mechanism is proposed by using the output result of the trained model as the initial solution.Multiple local search operators are conducted in a collaborative way and adjusted their utilizations according to the feedback of performances for obtaining the final schedule.Simulation results and statistical comparisons show that the proposed algorithm fusing deep reinforcement learning and the iterative greedy method is able to achieve better performances.
关 键 词:流水车间调度 深度强化学习 迭代贪婪算法 反馈协同机制
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.27.94