检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:宋旺 胡祥[1] 张玉辉 卫文江 周雅诗 康傲 SONG Wang;HU Xiang;ZHANG Yuhui;WEI Wenjiang;ZHOU Yashi;KANG Ao(School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China)
机构地区:[1]华北电力大学控制与计算机工程学院,北京102206
出 处:《数据采集与处理》2023年第3期652-664,共13页Journal of Data Acquisition and Processing
基 金:国家自然科学基金(52078212)。
摘 要:提出一种具备全局供需动态感知能力、基于均值场多智能体强化学习的网约车平台订单分配算法。该算法通过将多智能体强化学习与均值场理论相结合,提升了智能体在局部空间上相互之间的协作性;通过注入全局空间上供需的动态分布信息,提升了智能体对全局供需分布的感知和优化能力。本文构建了真实历史数据驱动的模拟器,用于算法的训练和评估。实验表明,在全天时段和高峰期时段两个不同场景下,本文提出的算法在网约车司机累计收益及订单应答率两个重要指标上均显著优于现有的订单分配算法。实验结果充分验证了本文提出算法的有效性。This paper proposes an order dispatch algorithm of online ride-hailing platform based on meanfield multi-agent reinforcement learning with the ability to globally perceive supply-demand dynamics.Our algorithm improves the collaboration between agents in the local area by integrating multi-agent reinforcement learning with mean-field theory,and enhances the ability of agents on perceiving and optimizing the global supply-demand gap across the global area by injecting the context about global supplydemand dynamics.Besides,we built a data-driven simulator for the training and evaluation of algorithms.Extensive experiments show that in two different scenarios of a whole day and rush hour,our algorithm significantly outperforms the existing order dispatch algorithms in terms of order response rate and accumulated drivers’income.The experimental results convincingly validate the effectiveness of our algorithm.
关 键 词:多智能体强化学习 均值场 全局供需动态感知 网约车平台 订单分配
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15