检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《软件工程与应用》2023年第3期530-542,共13页Software Engineering and Applications
摘 要:强化学习作为一种无模型的控制方法被应用于解决蜂窝网络中的同信道干扰问题。然而,在基于值的强化学习算法中,函数逼近存在误差导致Q值被高估,使算法收敛至次优策略而对信道干扰的抑制性能不佳,且在高频带场景中收敛速度缓慢。对此提出一种适用于分布式部署下的控制方法,使用DDQN学习离散策略,以添加三元组批评机制的延迟深度确定性策略梯度算法学习连续策略;使算法对动作价值的估计更准确,以提升算法在不同频带数量场景下对干扰的抑制性能。通过数量的扩展性实验表明了所提算法在不同频带数量场景下,保证更快收敛速度的同时对信道干扰有更好的抑制效果,证明了算法的有效性与扩展性。Reinforcement learning is applied as a model free control method to solve the problem of co channel interference in cellular networks. However, in value based reinforcement learning algorithms, error in function approximation leads to overestimation of the Q value, which leads to the algorithm converging to a suboptimal strategy and poor performance in suppressing channel interference, and the convergence speed is slow in high-frequency scenarios. This paper proposes a control method suitable for distributed deployment, which uses DDQN to learn discrete strategies, and adds a delay-depth deterministic strategy gradient algorithm with a triplet criticism mechanism to learn continuous strategies;Make the algorithm’s estimation of action value more accurate to improve the algorithm’s interference suppression performance under different frequency band number scenarios. Quantitative scalability experiments have shown that the proposed algorithm guarantees faster convergence speed and better suppression of channel interference in different frequency band scenarios, demonstrating the effectiveness and scalability of the algorithm.
关 键 词:分布式强化学习 功率控制 Actor-Critic算法 双重深度Q网络 延迟深度确定性策略梯度
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43