检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐帅 李艳武 谢辉[1] 牛晓伟[1] XU Shuai;LI Yanwu;XIE Hui;NIU Xiaowei(College of Electronic&Information Engineering,Chongqing Three Gorges University,Chongqing 404020,China)
机构地区:[1]重庆三峡学院电子与信息工程学院,重庆404020
出 处:《现代制造工程》2025年第3期19-30,共12页Modern Manufacturing Engineering
基 金:国家自然科学基金面上项目(12175194);重庆市教委科学技术研究项目(KJQN202301216,KJQN202001224)。
摘 要:作业车间调度问题是一个经典的NP-hard组合优化问题,其调度方案的优劣直接影响制造系统的运行效率。为得到更优的调度策略,以最小化最大完工时间为优化目标,提出了一种基于近端策略优化(Proximal Policy Optimization,PPO)和卷积神经网络(Convolutional Neural Network,CNN)的深度强化学习(Deep Reinforcement Learning,DRL)调度方法。设计了一种三通道状态表示方法,选取16种启发式调度规则作为动作空间,将奖励函数等价为最小化机器总空闲时间。为使训练得到的调度策略能够处理不同规模的调度算例,在卷积神经网络中使用空间金字塔池化(Spatial Pyramid Pooling,SPP),将不同维度的特征矩阵转化为固定长度的特征向量。在公开OR-Library的42个作业车间调度(Job-Shop Scheduling Problem,JSSP)算例上进行了计算实验。仿真实验结果表明,该算法优于单一启发式调度规则和遗传算法,在大部分算例中取得了比现有深度强化学习算法更好的结果,且平均完工时间最小。The job-shop scheduling problem is a classic NP-hard combinatorial optimization problem,and the quality of scheduling directly impacts the operational efficiency of manufacturing systems.In order to obtain a better scheduling strategy with the goal of minimizing the maximum completion time,a Deep Reinforcement Learning(DRL)scheduling method based on Proximal Policy Optimization(PPO)and Convolutional Neural Network(CNN)is proposed.A three-channel state representation method is designed,with 16 heuristic scheduling rules selected as the action space,and the reward function is equivalent to minimizing the total idle time of machines.In order to enable the trained scheduling strategy to handle scheduling instances of different scales,Spatial Pyramid Pooling(SPP)is applied in the convolutional neural network to convert feature matrices of different dimensions into fixed-length feature vectors.Computational experiments are conducted on 42 Job-Shop Scheduling Problem(JSSP)instances from the public OR-Library.The results of the simulation experiments show that the proposed algorithm outperforms single heuristic scheduling rules and genetic algorithms,achieving better results than existing deep reinforcement learning algorithms in most instances,and with the smallest average completion time.
关 键 词:深度强化学习 作业车间调度 卷积神经网络 近端策略优化 空间金字塔池化
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7