检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周权 牛英滔 ZHOU Quan;NIU Yingtao(The Sixty-third Research Institute,National University of Defense Technology,Nanjing 210007,China;School of Communication Engineering,Army Engineering University of PLA,Nanjing 210007,China)
机构地区:[1]国防科技大学第六十三研究所,江苏南京210007 [2]陆军工程大学通信工程学院,江苏南京210007
出 处:《通信学报》2024年第7期117-126,共10页Journal on Communications
基 金:国家自然科学基金资助项目(No.62371461)。
摘 要:为提高基于深度强化学习的通信抗干扰算法的学习效率,以更快适应未知干扰环境,提出一种基于相似性样本生成的深度强化学习快速抗干扰算法。该算法将基于互模拟关系的状态-动作对相似性度量与基于深度Q网络的抗干扰算法相结合,能在未知动态干扰环境下快速学习有效的多域抗干扰策略。算法在完成每步传输动作时,首先利用深度Q网络抗干扰算法与环境交互,获得实际的状态-动作对。然后,基于互模拟关系生成与之相似的状态-动作集,从而利用相似状态-动作集生成模拟的训练样本。通过上述操作,算法每步迭代能获得大量训练样本,可显著加快抗干扰算法的训练进程和收敛速度。仿真结果表明,在多路扫频干扰和智能阻塞干扰下,所提算法收敛速度快,且收敛后的归一化吞吐量均显著优于常规深度Q网络算法、Q学习算法以及基于知识复用的改进Q学习算法。To improve the learning efficiency of anti-jamming algorithms based on deep reinforcement learning and enable them to adapt more quickly to unknown jamming environments,a fast deep reinforcement learning anti-jamming algorithm based on similar sample generation was proposed.By combining the similarity measurement of state-action pairs,derived from bisimulation,with an anti-jamming algorithm grounded in the deep Q-network,this algorithm was able to quickly learn effective multi-domain anti-jamming strategies in unknown,dynamic jamming environments.Specifically,once a transmission action was completed,the proposed algorithm first interacted with the environment using the deep Q-network to acquire actual state-action pairs.Then it generated a set of similar state-action pairs based on bisimulation,employing these similar state-action pairs to produce simulated training samples.Through these operations,the algorithm was able to acquire a large number of training samples at each iteration step,thereby significantly accelerating the training process and convergence speed.Simulation results show that under comb sweep jamming and intelligent blocking jamming,the proposed algorithm exhibits rapid convergence speed,and its normalized throughput after convergence significantly superior to the conventional deep Q-network algorithm,the Q-learning algorithm,and the improved Q-learning algorithm based on knowledge reuse.
分 类 号:TN973.3[电子电信—信号与信息处理]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.169.138