检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张茜[1] 苏冬冬 张聪 李润川 ZHANG Qian;SU Dongdong;ZHANG Cong;LI Runchuan(School of Artificial Intelligence,Zhongyuan University of Zhengzhou Technology,450007,China;School of Computer Science,Zhongyuan University of Zhengzhou Technology,450007,China;Jiangxing Intelligence Inc.,Shenzhen 518100,China)
机构地区:[1]中原工学院人工智能学院,郑州450007 [2]中原工学院计算机学院,郑州450007 [3]深圳江行联加智能科技有限公司,广东深圳518100
出 处:《电讯技术》2024年第11期1750-1757,共8页Telecommunication Engineering
基 金:河南省科技攻关计划项目(242102211046);河南省高等学校重点科研项目(25A520039,24B520048);中原工学院优势学科实力提升计划资助(SD202230);中原工学院研究生教育教学改革研究项目(JG202424,JG202328);中原工学院基本科研业务费专项资金项目(K2022QN021)。
摘 要:针对移动边缘计算中的多用户协同任务卸载场景,提出了一种基于深度强化学习的多智能体协同任务卸载算法(Deep Reinforcement Learning-based Multi-agent Collaborative Task Offloading Algorithm,MCTO-DRL)。考虑到用户移动性、协同性、任务动态优先级以及资源受限等问题,构建了一种多用户协同任务卸载的网络模型。在此基础上建立了端到端优化目标函数,并利用马尔可夫决策过程(Markov Decison Processes,MDP)形式化多任务协同卸载问题。利用双向长短期记忆(Bidirectional Long Short-Term Memory,Bi-LSTM)网络提取状态向量动态时序依赖关系的特征信息,结合强化学习方法建立高维状态与动作之间的关系映射,并设计了一种动态优先级协同采样算法,用于提高多智能体的协同性。实验分析表明,在多智能体协同任务卸载场景中,MCTO-DRL算法最优卸载概率达到86%以上,时隙累积奖励较4种基线算法分别提升约20.0%、16.23%、22.0%、9.44%,并能够适应不同复杂性和需求型的卸载任务。A deep reinforcement learning based multi-agent collaborative task offloading algorithm(MCTO-DRL)is proposed for the multi-user collaborative task offloading scenario in mobile edge Considering computing.the problems of user mobility,collaboration,task dynamic priority and resource constraints,a multi-user collaborative task offloading network model is constructed.On this base,the end-to-end optimization objective function is established,and the multi-task collaborative offloading problem is formalized by using Markov decision processes(MDP).The bidirectional long short-term memory(Bi-LSTM)network is used to extract the feature information of the dynamic time-series dependency of the state vector.Combined with reinforcement learning method,the relationship mapping between high-dimensional state and action is established,and a dynamic priority collaborative sampling algorithm is designed to improve the collaboration of multi-agent.The experimental analysis shows that in the multi-agent collaborative task offloading scenario,the optimal offloading probability of MCTO-DRL algorithm reaches more than 86%.Compared with that of the four baseline algorithms,the time slot cumulative reward is increased by about 20.0%,16.23%,22.0%and 9.44%,respectively.And it can adapt to offloading tasks with different complexity and requirements.
关 键 词:移动边缘计算 深度强化学习 协同卸载 双向长短期记忆(Bi-LSTM)网络
分 类 号:TN929.5[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.128.153.31