基于深度强化学习的卫星动态任务实时调度时效性优化方法  被引量:1

Timeliness optimization of real-time scheduling for satellite dynamic tasks based on deep reinforcement learning

在线阅读下载全文

作  者:李可[1,2,3] 熊顺蕊 戴朋林 宋彤雨 禹旭敏 李天瑞 Ke LI;Shunrui XIONG;Penglin DAI;Tongyu SONG;Xumin YU;Tianrui LI(School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu 611756,China;Sustainable Urban Transportation Intelligent Technology Ministry of Education Engineering Research Center,Chengdu 611756,China;Si Chuan Network Communication Technology Key Laboratory,Chengdu 611756,China;Zhejiang Lab,Hangzhou 311121,China;Institude of Communication and Navigation Satellite,China Academy of Space Technology,Beijing 100094,China)

机构地区:[1]西南交通大学计算机与人工智能学院,成都611756 [2]可持续城市交通智能化教育部工程研究中心,成都611756 [3]四川省网络通信技术重点实验室,成都611756 [4]之江实验室,杭州311121 [5]中国航天科技集团第五研究院卫星与导航卫星总体部,北京100094

出  处:《中国科学:信息科学》2024年第10期2443-2469,共27页Scientia Sinica(Informationis)

基  金:国家自然科学基金(批准号:62202392,62172342,62002300,61941106);四川省网络与数据安全重点实验室项目(批准号:NDS2022-1);四川自然科学基金(批准号:2023NSFSC0459,2022NSFSC0944);河北省自然科学基金(批准号:F2022105003)资助项目。

摘  要:随着全球卫星数量的快速增长和天基网络的蓬勃发展,优化卫星任务调度以确保任务观测时效性变得至关重要.任务调度方法不仅影响观测数据采集的效率,还直接关系到天基信息系统能否及时响应多种实时应用需求.然而,针对非周期动态任务,传统批处理调度方法存在局限性,需收集完所有任务信息后才能作出决策,而现有基于深度强化学习的实时调度方法也无法保证紧急任务的观测时效性.鉴于此,本文首次提出了“时效性优化的敏捷卫星动态任务实时调度”问题,该问题定义任务观测时效性指标,综合考虑任务观测延迟和接收任务的总收益,以最大化所有任务观测时效性.为了求解该问题,设计了两阶段时效性优化算法PPODL-HR.在任务选择阶段,提出了基于深度神经网络和长短期记忆网络的近端策略优化以加快模型的收敛速度;在资源分配阶段,设计了启发式规则,通过任务合并进一步降低任务切换所需的卫星转换时间.通过数值仿真和STK仿真验证,PPODL-HR算法在任务观测时效性方面优于传统的静态批处理调度和现有的动态任务实时调度算法,且适用于不同任务密度和不同紧急性任务占比的情况.特别地,与经典的动态任务实时调度算法相比,任务观测时效性提高了21.14%,任务观测延迟降低了4.55%,接收任务的总收益增加了20.70%.With the rapid growth of the number of satellites worldwide and the vigorous development of spacebased networks,optimizing satellite task scheduling to ensure the timeliness of task observation has become crucial.The task scheduling method not only affects the efficiency of observation data acquisition but also directly relates to the ability of the space-based satellite system to respond promptly to multiple real-time application requirements.However,for aperiodic dynamic tasks,the traditional batch scheduling methods have limitations,and the decision can only be made after collecting information from all tasks.Existing real-time scheduling methods based on deep reinforcement learning cannot guarantee the observation timeliness of urgent tasks.Given this,this study first proposes the problem of timeliness-optimized agile satellite real-time scheduling for dynamic tasks.The problem aims to maximize the observation timeliness of all tasks by defining the task observation timeliness metrics,which take into account the task observation delay and the total benefit of receiving tasks.In order to solve the problem,a two-stage timeliness optimization algorithm PPODL-HR is designed in this paper.In the task selection stage,proximal policy optimization with deep neural network and long shortterm memory network(PPODL)is proposed to accelerate the convergence of the model training;while in the resource allocation stage,a heuristic rule HR is designed to reduce further the satellite transition time required for task switching through task merging.Numerical simulation and STK simulation verified that the PPODLHR algorithm outperforms the traditional static batch scheduling and existing dynamic real-time scheduling algorithms in terms of task observation timeliness,and it can be applied to different task densities and proportions of emergency tasks.In particular,compared with the classical dynamic task real-time scheduling algorithm,the timeliness of task observations is improved by 21.14%,observational latency for tasks is re

关 键 词:时效性优化 卫星任务调度 动态任务实时调度 深度强化学习 任务观测时效性 启发式 规则 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] V474[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象