波动需求库存路径问题的持续自学习求解算法  被引量:1

Persistent self-learning algorithm for inventory routing problem with periodic demand fluctuation

在线阅读下载全文

作  者:郭羽含 李津宁 沈学利 GUO Yuhan;LI Jinning;SHEN Xueli(School of Science,Zhejiang University of Science and Technology,Hangzhou 310023,China;College of Software,Liaoning Technical University,Huludao 125105,China)

机构地区:[1]浙江科技学院理学院,浙江杭州310023 [2]辽宁工程技术大学软件学院,辽宁葫芦岛125105

出  处:《计算机集成制造系统》2024年第4期1487-1505,共19页Computer Integrated Manufacturing Systems

基  金:国家自然科学基金资助项目(61404069);辽宁省自然科学基金资助项目(2019-ZD-0048);辽宁省教育厅基础研究项目(LJ2019JL012)。

摘  要:共享单车库存路径问题是一种受商品总量约束且需求周期性波动的库存路径问题,其优化过程需综合考虑资源利用率和调度成本,在求解大规模算例时难以同时保证求解效率和质量。针对上述挑战,将问题形式化为多目标序列化决策的马尔可夫过程,建立了时间序列混合整数规划模型并提出了一种全局持续自学习算法。算法由离线学习、在线规划和持续学习三阶段构成。离线学习阶段设计了基于随机策略的多智能体协同算法以获取配送载具时空分布和需求点需求模式的定量化描述;在线规划阶段根据历史订单数据,对各时间步中的需求模式进行预测以确定最优的库存分配数量,并利用离线学习阶段的定量信息对供应商配送载具进行调度;持续学习阶段于每个处理周期结束后使用记录的订单数据对周期内调度结果进行持续评估和改进。基于企业真实数据的实验表明,在需求预测模型复杂程度、求解质量、调度载具总数量、总调度距离和站点改善程度等的综合评价指标上,所提算法优于对比方法。此外,通过对多种策略进行对比分析,总结出了库存问题的成本变化规律,并验证了算法在大规模算例下的有效性。Inventory routing problem of bike-sharing systems involves periodic demand fluctuations and product volume restrictions.Its optimization requires balancing resource utilization rates and scheduling costs synthetically,and faces significant challenges in guaranteeing solving efficiency and solution quality synchronously.To address such challenge,the corresponding problem was formalized as a multi-objective serialized decision-making Markov process.A time-series-based mixed integer programming model was established,and a global persistent self-learning algorithm was proposed.The algorithm consisted of three stages:offline learning,online planning,and persistent learning.In the offline learning phase,a multi-agent cooperative algorithm based on random strategy was designed to obtain the spatiotemporal distribution of vehicles and the quantitative description of demand patterns.In the online learning phase,according to the historical order data,the temporal and spatial distribution pattern of each site in each time step was predicted to determine the optimal inventory allocation quantity,and the vehicles were dispatched by the quantitative information obtained in the offline learning stage.The dispatching results in the persistent learning phase were constantly evaluated and improved using the recorded order data within the processing cycle.Experiments based on real data showed that the proposed method was superior to the comparison methods in comprehensive evaluation indexes such as the complexity of the site demand prediction model,solution quality,the total number of dispatched vehicles,total dispatch distance,and station improvement degree.In addition,through the comparative analysis of various strategies,the cost variation trend of the problem was summarized,and the algorithm′s effectiveness in large-scale examples was verified.

关 键 词:库存路径 产品需求周期波动 强化学习 在线规划 持续学习 

分 类 号:TP11[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象