基于卷积金字塔网络的PPO算法求解作业车间调度问题

The PPO algorithm based on convolutional pyramid network to solve job-shop scheduling problem

作　　者：徐帅李艳武谢辉[1] 牛晓伟[1] XU Shuai;LI Yanwu;XIE Hui;NIU Xiaowei(College of Electronic&Information Engineering,Chongqing Three Gorges University,Chongqing 404020,China)

机构地区：[1]重庆三峡学院电子与信息工程学院,重庆404020

出　　处：《现代制造工程》2025年第3期19-30,共12页Modern Manufacturing Engineering

基　　金：国家自然科学基金面上项目(12175194);重庆市教委科学技术研究项目(KJQN202301216,KJQN202001224)。

摘　　要：作业车间调度问题是一个经典的NP-hard组合优化问题,其调度方案的优劣直接影响制造系统的运行效率。为得到更优的调度策略,以最小化最大完工时间为优化目标,提出了一种基于近端策略优化(Proximal Policy Optimization,PPO)和卷积神经网络(Convolutional Neural Network,CNN)的深度强化学习(Deep Reinforcement Learning,DRL)调度方法。设计了一种三通道状态表示方法,选取16种启发式调度规则作为动作空间,将奖励函数等价为最小化机器总空闲时间。为使训练得到的调度策略能够处理不同规模的调度算例,在卷积神经网络中使用空间金字塔池化(Spatial Pyramid Pooling,SPP),将不同维度的特征矩阵转化为固定长度的特征向量。在公开OR-Library的42个作业车间调度(Job-Shop Scheduling Problem,JSSP)算例上进行了计算实验。仿真实验结果表明,该算法优于单一启发式调度规则和遗传算法,在大部分算例中取得了比现有深度强化学习算法更好的结果,且平均完工时间最小。The job-shop scheduling problem is a classic NP-hard combinatorial optimization problem,and the quality of scheduling directly impacts the operational efficiency of manufacturing systems.In order to obtain a better scheduling strategy with the goal of minimizing the maximum completion time,a Deep Reinforcement Learning(DRL)scheduling method based on Proximal Policy Optimization(PPO)and Convolutional Neural Network(CNN)is proposed.A three-channel state representation method is designed,with 16 heuristic scheduling rules selected as the action space,and the reward function is equivalent to minimizing the total idle time of machines.In order to enable the trained scheduling strategy to handle scheduling instances of different scales,Spatial Pyramid Pooling(SPP)is applied in the convolutional neural network to convert feature matrices of different dimensions into fixed-length feature vectors.Computational experiments are conducted on 42 Job-Shop Scheduling Problem(JSSP)instances from the public OR-Library.The results of the simulation experiments show that the proposed algorithm outperforms single heuristic scheduling rules and genetic algorithms,achieving better results than existing deep reinforcement learning algorithms in most instances,and with the smallest average completion time.

关键词：深度强化学习作业车间调度卷积神经网络近端策略优化空间金字塔池化

分类号：TP301.6[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于卷积金字塔网络的PPO算法求解作业车间调度问题

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于卷积金字塔网络的PPO算法求解作业车间调度问题

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索