基于分布式强化学习的功率控制算法研究  

Research on Power Control Algorithm Based on Distributed Reinforcement Learning

在线阅读下载全文

作  者:司轲 李烨[1] 

机构地区:[1]上海理工大学光电信息与计算机工程学院,上海

出  处:《软件工程与应用》2023年第3期530-542,共13页Software Engineering and Applications

摘  要:强化学习作为一种无模型的控制方法被应用于解决蜂窝网络中的同信道干扰问题。然而,在基于值的强化学习算法中,函数逼近存在误差导致Q值被高估,使算法收敛至次优策略而对信道干扰的抑制性能不佳,且在高频带场景中收敛速度缓慢。对此提出一种适用于分布式部署下的控制方法,使用DDQN学习离散策略,以添加三元组批评机制的延迟深度确定性策略梯度算法学习连续策略;使算法对动作价值的估计更准确,以提升算法在不同频带数量场景下对干扰的抑制性能。通过数量的扩展性实验表明了所提算法在不同频带数量场景下,保证更快收敛速度的同时对信道干扰有更好的抑制效果,证明了算法的有效性与扩展性。Reinforcement learning is applied as a model free control method to solve the problem of co channel interference in cellular networks. However, in value based reinforcement learning algorithms, error in function approximation leads to overestimation of the Q value, which leads to the algorithm converging to a suboptimal strategy and poor performance in suppressing channel interference, and the convergence speed is slow in high-frequency scenarios. This paper proposes a control method suitable for distributed deployment, which uses DDQN to learn discrete strategies, and adds a delay-depth deterministic strategy gradient algorithm with a triplet criticism mechanism to learn continuous strategies;Make the algorithm’s estimation of action value more accurate to improve the algorithm’s interference suppression performance under different frequency band number scenarios. Quantitative scalability experiments have shown that the proposed algorithm guarantees faster convergence speed and better suppression of channel interference in different frequency band scenarios, demonstrating the effectiveness and scalability of the algorithm.

关 键 词:分布式强化学习 功率控制 Actor-Critic算法 双重深度Q网络 延迟深度确定性策略梯度 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象