频分多址系统分布式强化学习功率控制方法  

Distributed reinforcement learning based power control forfrequency division multiple access systems

在线阅读下载全文

作  者:李烨[1] 司轲 Li Ye;Si Ke(School of Optical-Electrical&Computer Engineering,University of Shanghai for Science&Technology,Shanghai 200093,China)

机构地区:[1]上海理工大学光电信息与计算机工程学院,上海200093

出  处:《计算机应用研究》2023年第12期3772-3777,共6页Application Research of Computers

基  金:华为技术有限公司合作资助项目(YBN2019115054)。

摘  要:近年来,深度强化学习作为一种无模型的资源分配方法被用于解决无线网络中的同信道干扰问题。然而,基于常规经验回放策略的网络难以学习到有价值的经验,导致收敛速度较慢;而人工划定探索步长的方式没有考虑算法在每个训练周期上的学习情况,使得对环境的探索存在盲目性,限制了系统频谱效率的提升。对此,提出一种频分多址系统的分布式强化学习功率控制方法,采用优先经验回放策略,鼓励智能体从环境中学习更重要的数据,以加速学习过程;并且设计了一种适用于分布式强化学习、动态调整步长的探索策略,使智能体得以根据自身学习情况探索本地环境,减少人为设定步长带来的盲目性。实验结果表明,相比于现有算法,所提方法加快了收敛速度,提高了移动场景下的同信道干扰抑制能力,在大型网络中具有更高的性能。In recent years,deep reinforcement learning has been used as a model-free resource allocation method to solve the problem of co-channel interference in wireless networks.However,networks based on conventional experience replay strategies are difficult to learn valuable experiences,resulting in slower convergence speed.The manual method of determining the exploration step size does not take into account the learning situation of the algorithm in each training cycle,resulting in blind exploration of the environment and limited improvement of the system spectral efficiency.This paper proposed a distributed reinforcement learning power control method for frequency division multiple access systems,which adopted a priority experience replay strategy to encourage agents to learn more important data from the environment to accelerate the learning process.Moreover,this paper designed an exploration strategy with dynamic adjustment of step size suitable for distributed reinforcement learning.The strategy allowed agents to explore the local environment based on their own learning situation and hence reduced the blindness caused by manually setting step sizes.The experimental results show that compared to existing algorithms,the proposed method accelerates the convergence speed,improves the ability of co-channel interference suppression in mobile scenarios,and gains higher performance in large networks.

关 键 词:分布式强化学习 频分多址系统 功率控制 贪心策略 优先经验回放 动态步长调整 

分 类 号:TP929.5[自动化与计算机技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象