基于深度强化学习的四向协同三维装箱方法

A Four Directional Cooperative Three-dimensional Packing Method Based on Deep Reinforcement Learning

作　　者：尹昊陈帆[2] 和红杰[1] YIN Hao;CHEN Fan;HE Hong-Jie(School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756;School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu 611756)

机构地区：[1]西南交通大学信息科学与技术学院,成都611756 [2]西南交通大学计算机与人工智能学院,成都611756

出　　处：《自动化学报》2024年第12期2420-2431,共12页Acta Automatica Sinica

摘　　要：物流作为现代经济的重要组成部分,在国民经济和社会发展中发挥着重要作用.物流中的三维装箱问题(Three-dimensional bin packing problem,3D-BPP)是提高物流运作效率必须解决的关键难题之一.深度强化学习(Deep rein-forcement learning,DRL)具有强大的学习与决策能力,基于DRL的三维装箱方法(Three-dimensional bin packing method based on DRL,DRL-3DBP)已成为智能物流领域的研究热点之一.现有DRL-3DBP面对大尺寸容器3D-BPP时难以达成动作空间、计算复杂性与探索能力之间的平衡.为此,提出一种四向协同装箱(Four directional cooperative packing,FDCP)方法:两阶段策略网络接收旋转后的容器状态,生成4个方向的装箱策略;根据由4个策略采样而得的动作更新对应的4个状态,选取其中价值最大的对应动作为装箱动作.FDCP在压缩动作空间、减小计算复杂性的同时,鼓励智能体对4个方向合理装箱位置的探索.实验结果表明,FDCP在100×100大尺寸容器以及20、30、50箱子数量的装箱问题上实现了1.2%~2.9%的空间利用率提升.As an important part of the modern economy,logistics plays an important role in the national economy and social development.The three-dimensional bin packing problem(3D-BPP)in logistics is one of the key prob-lems that must be solved to improve the efficiency of logistics operations.Deep reinforcement learning(DRL)has a powerful learning and decision-making ability,and the three-dimensional bin packing method based on DRL(DRL-3DBP)has become one of the research hotspots in the field of intelligent logistics.The existing DRL-3DBPs have difficulty in striking a balance between the action space,computational complexity,and exploration capability when solving 3D-BPP with large-size bins.To this end,this paper proposes a four directional cooperative packing(FD-CP)method.The two-stage policy network receives the rotated bin states and generates four directional packing policies.Based on the actions sampled from the four policies,the four states are updated accordingly,and the ac-tion corresponding to the highest value is selected as the packing action.FDCP encourages agent to explore reason-able packing positions in all four directions while compressing the action space and reducing computational com-plexity.Experimental results show that FDCP achieves 1.2%~2.9%improvement in space utilization on the pack-ing problem with 100×100 large-sized bin and the numbers of 20,30,and 50 items.

关键词：三维装箱问题组合优化问题深度强化学习四向协同装箱

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的四向协同三维装箱方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的四向协同三维装箱方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索