基于深度强化学习的四向协同三维装箱方法  

A Four Directional Cooperative Three-dimensional Packing Method Based on Deep Reinforcement Learning

在线阅读下载全文

作  者:尹昊 陈帆[2] 和红杰[1] YIN Hao;CHEN Fan;HE Hong-Jie(School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756;School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu 611756)

机构地区:[1]西南交通大学信息科学与技术学院,成都611756 [2]西南交通大学计算机与人工智能学院,成都611756

出  处:《自动化学报》2024年第12期2420-2431,共12页Acta Automatica Sinica

摘  要:物流作为现代经济的重要组成部分,在国民经济和社会发展中发挥着重要作用.物流中的三维装箱问题(Three-dimensional bin packing problem,3D-BPP)是提高物流运作效率必须解决的关键难题之一.深度强化学习(Deep rein-forcement learning,DRL)具有强大的学习与决策能力,基于DRL的三维装箱方法(Three-dimensional bin packing method based on DRL,DRL-3DBP)已成为智能物流领域的研究热点之一.现有DRL-3DBP面对大尺寸容器3D-BPP时难以达成动作空间、计算复杂性与探索能力之间的平衡.为此,提出一种四向协同装箱(Four directional cooperative packing,FDCP)方法:两阶段策略网络接收旋转后的容器状态,生成4个方向的装箱策略;根据由4个策略采样而得的动作更新对应的4个状态,选取其中价值最大的对应动作为装箱动作.FDCP在压缩动作空间、减小计算复杂性的同时,鼓励智能体对4个方向合理装箱位置的探索.实验结果表明,FDCP在100×100大尺寸容器以及20、30、50箱子数量的装箱问题上实现了1.2%~2.9%的空间利用率提升.As an important part of the modern economy,logistics plays an important role in the national economy and social development.The three-dimensional bin packing problem(3D-BPP)in logistics is one of the key prob-lems that must be solved to improve the efficiency of logistics operations.Deep reinforcement learning(DRL)has a powerful learning and decision-making ability,and the three-dimensional bin packing method based on DRL(DRL-3DBP)has become one of the research hotspots in the field of intelligent logistics.The existing DRL-3DBPs have difficulty in striking a balance between the action space,computational complexity,and exploration capability when solving 3D-BPP with large-size bins.To this end,this paper proposes a four directional cooperative packing(FD-CP)method.The two-stage policy network receives the rotated bin states and generates four directional packing policies.Based on the actions sampled from the four policies,the four states are updated accordingly,and the ac-tion corresponding to the highest value is selected as the packing action.FDCP encourages agent to explore reason-able packing positions in all four directions while compressing the action space and reducing computational com-plexity.Experimental results show that FDCP achieves 1.2%~2.9%improvement in space utilization on the pack-ing problem with 100×100 large-sized bin and the numbers of 20,30,and 50 items.

关 键 词:三维装箱问题 组合优化问题 深度强化学习 四向协同装箱 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象