基于深度强化学习的IRS辅助认知无线电系统波束成形算法

Deep Reinforcement Learning Based Beamforming Algorithm for IRS Assisted Cognitive Radio System

作　　者：李国权[1] 程涛郭永存庞宇林金朝 LI Guoquan;CHENG Tao;GUO Yongcun;PANG Yu;LIN Jinzhao(School of Communications and Information Engineering,Chongqing University of Posts and Telecommunications,,Chongqing 400065,China;Chongqing Key Laboratory of Optoelectronic Information Sensing and Microsystems,Chongqing 400065,China)

机构地区：[1]重庆邮电大学通信与信息工程学院,重庆400065 [2]光电信息感测与微系统重庆市重点实验室,重庆400065

出　　处：《电子与信息学报》2025年第3期657-665,共9页Journal of Electronics & Information Technology

基　　金：国家自然科学基金(U21A20447);重庆市自然科学基金创新群体科学基金(cstc2020jcyj-cxttX0002)。

摘　　要：为进一步提升多用户无线通信系统的频谱利用率,该文提出了一种基于深度强化学习的智能反射面(IRS)辅助认知无线电网络次用户和速率最大化算法。首先在考虑次基站最大发射功率约束、次基站对主用户的干扰容限约束以及IRS相移矩阵单位模量约束的情况下,建立一个联合优化次基站波束成形和IRS相移矩阵的资源分配模型;然后提出了一种基于深度确定性策略梯度的主被动波束成形算法,联合进行变量优化以最大化次用户和速率。仿真结果表明,所提算法相对于传统优化算法在和速率性能接近的情况下具有更低的时间复杂度。Objective With the rapid development of wireless communication technologies,the demand for spectrum resources has significantly increased.Cognitive Radio(CR)has emerged as a promising solution to improve spectrum utilization by enabling Secondary Users(SUs)to access licensed spectrum bands without causing harmful interference to Primary Users(PUs).However,traditional CR networks face challenges in achieving high spectral efficiency due to limited control over the wireless environment.Intelligent Reflecting Surfaces(IRS)have recently been introduced as a revolutionary technology to enhance communication performance by dynamically reconfiguring the propagation environment.This paper aims to maximize the sum rate of SUs in an IRS-assisted CR network by jointly optimizing the active beamforming at the Secondary Base Station(SBS)and the passive beamforming at the IRS,subject to constraints on the maximum transmit power of the SBS,the interference tolerance of PUs,and the unit modulus of the IRS phase shifts.Methods To address the non-convex and highly coupled optimization problem,a Deep Reinforcement Learning(DRL)-based algorithm is proposed.Specifically,the problem is formulated as a Markov Decision Process(MDP),where the state space includes the Channel State Information(CSI)of the entire system,the Signal-to-Interference-plus-Noise Ratio(SINR)in the SU network,and the action space consists of the SBS beamforming vectors and the IRS phase shift matrix.The reward function is designed to maximize the sum rate of SUs while penalizing violations of the constraints.The Deep Deterministic Policy Gradient(DDPG)algorithm is used to solve the MDP,owing to its ability to handle continuous action spaces.The DDPG framework consists of an actor network,which outputs the optimal actions,and a critic network,which evaluates these actions based on the reward function.The training process involves interacting with the environment to learn the optimal policy,and the algorithm is fine-tuned to ensure convergence and robustness under v

关键词：智能反射面认知无线电深度强化学习波束成形

分类号：TN929.5[电子电信—通信与信息系统]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的IRS辅助认知无线电系统波束成形算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的IRS辅助认知无线电系统波束成形算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索