基于强化学习的人道主义应急物资分配优化研究

Research on the Optimization of Humanitarian Emergency Material Allocation Based on Reinforcement Learning

作　　者：张建军[1] 杨云丹周一卓 ZHANG Jianjun;YANG Yundan;ZHOU Yizhuo(School of Economics and Management,Tongji University,Shanghai 200092,China)

机构地区：[1]同济大学经济与管理学院,上海200092

出　　处：《上海管理科学》2025年第2期109-117,共9页Shanghai Management Science

基　　金：国家自然科学基金项目(M-0310);上海市软科学重点课题(23692109300);上海市社科规划课题(2022ZGL011)。

摘　　要：当重大突发事件发生后,救援组织如何高效地分配有限的人道主义援助物资,在满足受灾区域物资需求的同时又能降低灾民的痛苦,是一项重要的研究课题。针对这一问题,本文建模了适配的混合非整数线性规划问题MINLP,涉及多期动态最优化分配策略求解。作为当前策略探索问题的两种主流方法之一的强化学习算法,通过与环境的交互获取反馈信号以调整策略从而自适应外部动态变化,扩展性极强,比针对特定状态求解的启发式算法更适合动态物资分配场景,由此采取Dueling DQN算法求解最优策略,规避了以往强化学习用于人道主义物资分配领域中存在的Q值过高估计缺点,更精准地求出受灾区域的动作价值函数。与此同时,本文构建需求随机化假设,这一创新使得模型构造更符合受灾场景实际情况,模型的有效性、真实性得以提升。本文以雅安地震为背景,利用数值算例验证了算法的效能,是首篇代入真实数据源佐证强化学习优化应急物资分配方案的论文:相对于传统的DQN方法,Dueling DQN算法能够降低总成本约5%,这意味着在确保物资供给的同时更有效减少了受灾人群的痛苦,彰显了我国“以人为本”的救援原则,在基于人道主义的应急救援方面具备重要的理论和实践意义。The efficient allocation of limited humanitarian aid supplies following major emergencies is a critical research topic,aiming to meet the material needs of affected areas while reducing the suffering of disaster victims.This paper addresses this issue by modeling a Mixed Integer Nonlinear Program-ming(MINLP)problem,which involves solving multi-period dynamic optimization allocation strate-gies.Reinforcement Learning(RL),as one of the two mainstream methods for current strategy explo-ration,is particularly suitable for dynamic resource allocation scenarios due to its strong scalability and adaptability to external dynamics through interaction with the environment and feedback signals.We employ the Dueling DQN algorithm to solve for the optimal policy,overcoming the overestimation of Q-values that has been a drawback in previous RL applications to humanitarian aid distribution.This approach more accurately estimates the action-value function for affected regions.Additionally,the pa-per introduces a novel stochastic demand assumption,enhancing the model’s realism and validity by better reflecting the actual conditions of disaster scenarios.The effectiveness of the proposed method is demonstrated using a numerical example based on the Ya’an earthquake,making this the first study to substantiate the optimization of emergency resource allocation using real data sources with RL.Comparative analysis shows that the Dueling DQN algorithm reduces the total cost by approximately 5%compared to traditional DQN methods,indicating a more effective re-duction in the suffering of affected populations.This aligns with the“people-oriented”rescue prin-ciple of China and holds significant theoretical and practical implications for humanitarian-based emergency responses.

关键词：深度强化学习人道主义应急物资分配 Dueling DQN算法

分类号：F25[经济管理—国民经济]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的人道主义应急物资分配优化研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于强化学习的人道主义应急物资分配优化研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索