检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李国权[1] 程涛 郭永存 庞宇 林金朝 LI Guoquan;CHENG Tao;GUO Yongcun;PANG Yu;LIN Jinzhao(School of Communications and Information Engineering,Chongqing University of Posts and Telecommunications,,Chongqing 400065,China;Chongqing Key Laboratory of Optoelectronic Information Sensing and Microsystems,Chongqing 400065,China)
机构地区:[1]重庆邮电大学通信与信息工程学院,重庆400065 [2]光电信息感测与微系统重庆市重点实验室,重庆400065
出 处:《电子与信息学报》2025年第3期657-665,共9页Journal of Electronics & Information Technology
基 金:国家自然科学基金(U21A20447);重庆市自然科学基金创新群体科学基金(cstc2020jcyj-cxttX0002)。
摘 要:为进一步提升多用户无线通信系统的频谱利用率,该文提出了一种基于深度强化学习的智能反射面(IRS)辅助认知无线电网络次用户和速率最大化算法。首先在考虑次基站最大发射功率约束、次基站对主用户的干扰容限约束以及IRS相移矩阵单位模量约束的情况下,建立一个联合优化次基站波束成形和IRS相移矩阵的资源分配模型;然后提出了一种基于深度确定性策略梯度的主被动波束成形算法,联合进行变量优化以最大化次用户和速率。仿真结果表明,所提算法相对于传统优化算法在和速率性能接近的情况下具有更低的时间复杂度。Objective With the rapid development of wireless communication technologies,the demand for spectrum resources has significantly increased.Cognitive Radio(CR)has emerged as a promising solution to improve spectrum utilization by enabling Secondary Users(SUs)to access licensed spectrum bands without causing harmful interference to Primary Users(PUs).However,traditional CR networks face challenges in achieving high spectral efficiency due to limited control over the wireless environment.Intelligent Reflecting Surfaces(IRS)have recently been introduced as a revolutionary technology to enhance communication performance by dynamically reconfiguring the propagation environment.This paper aims to maximize the sum rate of SUs in an IRS-assisted CR network by jointly optimizing the active beamforming at the Secondary Base Station(SBS)and the passive beamforming at the IRS,subject to constraints on the maximum transmit power of the SBS,the interference tolerance of PUs,and the unit modulus of the IRS phase shifts.Methods To address the non-convex and highly coupled optimization problem,a Deep Reinforcement Learning(DRL)-based algorithm is proposed.Specifically,the problem is formulated as a Markov Decision Process(MDP),where the state space includes the Channel State Information(CSI)of the entire system,the Signal-to-Interference-plus-Noise Ratio(SINR)in the SU network,and the action space consists of the SBS beamforming vectors and the IRS phase shift matrix.The reward function is designed to maximize the sum rate of SUs while penalizing violations of the constraints.The Deep Deterministic Policy Gradient(DDPG)algorithm is used to solve the MDP,owing to its ability to handle continuous action spaces.The DDPG framework consists of an actor network,which outputs the optimal actions,and a critic network,which evaluates these actions based on the reward function.The training process involves interacting with the environment to learn the optimal policy,and the algorithm is fine-tuned to ensure convergence and robustness under v
分 类 号:TN929.5[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49