检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘润滋 马天赐 吴伟华 要趁红[1] 杨清海[3] LIU Runzi;MA Tianci;WU Weihua;YAO Chenhong;YANG Qinghai(School of Information and Control Engineering,Xi’an University of Architecture and Technology,Xi’an 710399,China;School of Physics and Information Technology,Shaanxi Normal University,Xi’an 710119,China;School of Telecommunications Engineering,Xidian University,Xi’an 710071,China)
机构地区:[1]西安建筑科技大学信息与控制工程学院,陕西西安710399 [2]陕西师范大学物理学与信息技术学院,陕西西安710119 [3]西安电子科技大学通信工程学院,陕西西安710071
出 处:《通信学报》2023年第7期207-217,共11页Journal on Communications
基 金:国家自然科学基金资助项目(No.61701365,No.61801365,No.61971327);陕西省自然科学基础研究计划基金资助项目(No.2023-JC-YB-566,No.2023-JC-YB-542);陕西省重点研发计划基金资助项目(No.2021GY-066);陕西省高校科协青年人才托举计划资助项目(No.20200112);陕西省博士后科研基金资助项目(No.2018BSHEDZZ47)。
摘 要:近年来,随着各类紧急任务数量的不断增长,如何在控制对常规任务影响的同时保障系统的收益已成为中继卫星网络任务动态调度的巨大挑战。针对这一问题,以最大化紧急任务总收益和最小化常规任务破坏程度为目标,提出了一种基于分层强化学习的中继卫星网络任务动态调度方法。具体而言,为了兼顾系统的长期与短期性能,设计了由上、下级DQN实现的双层调度框架,上级DQN从长期性能出发决定临时优化目标,下级DQN根据优化目标决定当前任务的调度策略。仿真结果表明,与传统的深度学习方法以及部分处理动态调度问题的启发式方法相比,所提方法能够在降低常规任务破坏程度的同时提升紧急任务总收益。In recent years,with the increasing number of various emergency tasks,how to control the impact on common tasks while ensuring system revenue has become a huge challenge for the dynamic scheduling of relay satellite networks.Aiming at this problem,with the goal of maximizing the total revenue of emergency tasks and minimizing the damage to common tasks,a dynamic task scheduling method for relay satellite networks based on hierarchical reinforcement learning was proposed.Specifically,in order to take into account the long-term and short-term performance of the system at the same time,a two-layer scheduling framework implemented by upper-level and lower-level DQN was designed.The upper-level DQN was responsible for determining the temporary optimization goal based on long-term performance,and the lower-level DQN determined the scheduling strategy for current task according to the optimization goal.Simulation results show that compared with traditional deep learning methods and the heuristic methods dealing with dynamic scheduling problems,the proposed method can improve the total revenue of urgent tasks while reducing the damage to common tasks.
关 键 词:中继卫星网络 任务调度 深度强化学习 多目标优化 动态调度
分 类 号:TN92[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.12.34.36