大规模MIMO系统中功率分配的深度强化学习方法  

Deep Reinforcement Learning Approach for Power Allocation in Massive MIMO Systems

在线阅读下载全文

作  者:李烨[1] 肖梦巧 LI Ye;XIAO Meng-qiao(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)

机构地区:[1]上海理工大学光电信息与计算机工程学院,上海200093

出  处:《小型微型计算机系统》2023年第10期2221-2227,共7页Journal of Chinese Computer Systems

基  金:华为技术有限公司合作项目(YBN2019115054)资助.

摘  要:对于最大化下行链路总和频谱效率优化问题,目前仍然缺乏针对多小区多用户大规模MIMO系统的研究,且通常未考虑上行信道状态信息的不完美.鉴于此,研究了上行信道状态信息不完美条件下的多小区多用户大规模MIMO系统下行链路总和频谱效率优化问题,以最大化下行链路总和频谱效率为目标,提出了深度Q网络和深度确定性策略梯度的两种功率分配方法.深度Q网络可解决通信系统中维度爆炸和缺乏泛化的问题,但Q-Learning算法仅适用于离散空间,必须量化传输功率.而深度确定性策略梯度是适用连续动作空间的算法,可解决由于量化功率带来的性能下降问题.仿真结果表明,与其他传统功率分配方法相比,所提方法可获得更优的总和频谱效率性能,而且时间复杂度要低得多.此外,深度确定性策略梯度方法在总和频谱效率性能和时间复杂度方面都优于深度Q网络.For the optimization problem of maximizing downlink total spectral efficiency,there is still a lack of research on multi-cell multi-user massive MIMO systems,and the imperfection of uplink channel state information is usually not considered.In view of this,the optimization problem of downlink sum spectral efficiency of multi-cell multi-user massive MIMO system under the condition of imperfect uplink channel state information is studied.With the goal of maximizing the downlink sum spectral efficiency,deep Q network and deep deterministic policy gradients are proposed,which are two power allocation methods.Deep Q-network can solve the problems of dimensionality explosion and lack of generalization in communication systems,but the Q-Learning algorithm is only applicable to discrete spaces,and the transmission power must be quantified.And deep deterministic policy gradient is an algorithm suitable for continuous action space,which can solve the performance degradation caused by quantization power.Simulation results show that compared with other traditional power allocation methods,the proposed method can achieve better sum spectral efficiency performance with much lower time complexity.Furthermore,deep deterministic policy gradient methods outperform deep Q-network in both sum spectral efficiency performance and time complexity.

关 键 词:大规模MIMO 功率分配 深度强化学习 总和频谱效率 不完美信道状态信息 导频污染 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象