基于深度强化学习的多目标边缘任务调度研究被引量：5

Research on multi-objective edge task scheduling based on deepreinforcement learning

作　　者：盛煜朱正伟[1] 朱晨阳[2] 诸燕平[1] Sheng Yu;Zhu Zhengwei;Zhu Chenyang;Zhu Yanping(School of Microelectronics and Control Engineering,Changzhou University,Changzhou 213000,China;School of Computer Science and Artificial Intelligence,Changzhou University,Changzhou 213000,China)

机构地区：[1]常州大学微电子与控制工程学院,常州213000 [2]常州大学计算机与人工智能学院,常州213000

出　　处：《电子测量技术》2023年第8期74-81,共8页Electronic Measurement Technology

基　　金：常州市重点研发计划(CJ20210123)项目资助。

摘　　要：针对深度强化学习在边缘计算环境下的多目标任务调度时存在优化效果差等问题,提出了一种新的基于改进的竞争深度双Q网络的多目标任务调度算法(IMTS-D3QN)。首先将深度双Q网络对目标中的最大操作分解为动作选择和动作评估,以消除过高估计;采用立即奖励经验样本分类方法,对经验样本按照重要性程度分类存储,训练时选取更多重要性程度高的经验样本,提高了实际样本的利用率,加快了神经网络的训练速度。然后,通过引入竞争网络结构对神经网络进行优化。最后,采用软更新方法提高算法的稳定性,并采用动态ε贪婪指数递减法寻找最优策略。通过不同线性加权组合得出帕累托最优解,达到响应时间和能耗最小化。实验结果表明,IMTS-D3QN算法与其他算法相比,在不同任务数下响应时间与能耗上具有明显的优化效果。Aiming at the problems of unstable convergence and poor optimization effect in the multi-objective task scheduling of deep reinforcement learning in the edge computing environment,a new multi-objective task scheduling algorithm based on an improved competitive deep double-Q network(IMTS-D3QN)was proposed.First,the selection and calculation of the target Q value are decoupled by the deep double-Q network to eliminate overestimation,the immediate reward experience sample classification method is adopted to extract experience samples from the experience replay unit,which improves the utilization rate of actual samples,which speeds up the training speed of the neural network.Then,the neural network is optimized by introducing competing network structures.Finally,the soft update method is used to improve the stability of the algorithm,and the dynamicε-greedy exponential decreasing method is used to find the optimal strategy.The Pareto optimal solution is obtained through different linear weighting combinations to minimize the response time and energy consumption.The experimental results show that,compared with other algorithms,the IMTS-D3QN algorithm has obvious optimization effect in response time and energy consumption under different number of tasks.

关键词：边缘计算任务调度多目标深度强化学习

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的多目标边缘任务调度研究被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的多目标边缘任务调度研究 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于深度强化学习的多目标边缘任务调度研究被引量：5