基于深度强化学习的空天地一体化网络资源分配算法  被引量:1

A Resource Allocation Algorithm for Space-Air-Ground Integrated Network Based on Deep Reinforcement Learning

在线阅读下载全文

作  者:刘雪芳[1] 毛伟灏 杨清海[1] LIU Xuefang;MAO Weihao;YANG Qinghai(School of Telecommunications Engineering,Xidian University,Xi’an 710071,China)

机构地区:[1]西安电子科技大学通信工程学院,西安710071

出  处:《电子与信息学报》2024年第7期2831-2841,共11页Journal of Electronics & Information Technology

基  金:国家重点研发计划(2020YFB1807700)。

摘  要:空天地一体化网络(SAGIN)通过提高地面网络的资源利用率可以有效满足多种业务类型的通信需求,然而忽略了系统的自适应能力和鲁棒性及不同用户的服务质量(QoS)。针对这一问题,该文提出在空天地一体化网络架构下,面向城区和郊区通信的深度强化学习(DRL)资源分配算法。基于第3代合作伙伴计划(3GPP)标准中定义的用户参考信号接收功率(RSRP),考虑地面同频干扰情况,以不同域中基站的时频资源作为约束条件,构建了最大化系统用户的下行吞吐量优化问题。利用深度Q网络(DQN)算法求解该优化问题时,定义了能够综合考虑用户服务质量需求、系统自适应能力及系统鲁棒性的奖励函数。仿真结果表明,综合考虑无人驾驶汽车,沉浸式服务及普通移动终端通信业务需求时,表征系统性能的奖励函数值在2 000次迭代下,相较于贪婪算法提升了39.1%;对于无人驾驶汽车业务,利用DQN算法进行资源分配后,相比于贪婪算法,丢包数平均下降38.07%,时延下降了6.05%。The Space-Air-Ground Integrated Network(SAGIN)can effectively meet the communication needs of various service types by improving the resource utilization of the ground network,but ignoring the adaptive ability and robustness of the system and the Quality of Service(QoS)in different users.In response to this problem,a Deep Reinforcement Learning(DRL)Resource allocation algorithm for urban and suburban communications under the SAGIN architecture is proposed in this paper.Based on Reference Signal Reception Power(RSRP)defined in the 3rd Generation Partnership Project(3GPP)standard,considering ground co-frequency interference,and using the time-frequency resources of base stations in different domains as constraints,an optimization problem to maxmize the downlink throughput of system users is constructed.When using the Deep Q-network(DQN)algorithm to solve the optimization problem,a reward function which can comprehensively consider the user’s QoS requirements,system adaptability and system robustness is defined.Considering the service requirements of unmanned vehicles,immersive services and ordinary mobile communication services,the simulation results show that the value of the reward function which represents the performance of the system is increased by 39.1%compared with the greedy algorithm under 2000 iterations.For the unmanned vehicle services,the average packet loss rate by the DQN algorithm is 38.07%lower than that by the greedy algorithm,and the delay by the DQN algorithm is also 6.05%lower than that by the greedy algorithm.

关 键 词:空天地一体化网络 资源分配算法 深度强化学习 深度Q网络 

分 类 号:TN929.5[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象