检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:殷凯杰 石嘉 段国栋 李立欣[3] 司江勃[1] YIN Kaijie;SHI Jia;DUAN Guodong;LI Lixin;SI Jiangbo(School of Telecommunications Engineering,Xidian University,Xi'an 710071,China;Southwest China Research Institute of Electronic Equipment,Chengdu 610036,China;School of Electronics and Information,Northwestern Polytechnical University,Xi'an 710129,China)
机构地区:[1]西安电子科技大学通信工程学院,西安710071 [2]中国电子科技集团公司第二十九研究所,成都610036 [3]西北工业大学电子信息学院,西安710129
出 处:《航空学报》2024年第22期233-246,共14页Acta Aeronautica et Astronautica Sinica
基 金:电磁空间作战与应用重点实验室基金(JJ2021-001)。
摘 要:针对复杂电磁环境下的多功能电磁设备用频激烈冲突问题,考虑连续和离散混合动作耦合决策挑战,研究基于强化学习的智能频谱共享技术。首先,考虑己方和干扰方用频规则等多方面因素影响,对复杂电磁干扰环境进行精细化建模,在此基础上,设计多任务需求下雷达通信一体化设备的频谱共享效能评估方法。其次,提出一种Greedy Proxi-mal Policy Optimization(Greedy-PPO)智能频谱共享决策算法,对离散-连续动作空间进行解耦,利用PPO方法最优配置传输功率,基于此,结合Greedy方法求解频谱离散优化分配问题,获得近似最优的联合频谱共享策略。最后,通过仿真实验验证,Greedy-PPO算法相比贪心算法和DDQN算法,总体效能指标可提升48%和15%,具有优良的频谱利用率表现。Considering the challenge of continuous and discrete hybrid action coupling decision-making,an intelligent spectrum sharing technology based on reinforcement learning is studied to solve the problem of intense frequency con-flict of multi-functional electromagnetic equipment in complex electromagnetic environment.Firstly,considering the influence of many factors such as the frequency rules of the own side and the jamming side,a sophisticated model of the complex electromagnetic interference environment is developed.Based on this,a spectrum sharing efficiency evaluation index for radar communication integrated equipment under multitask requirements is designed.Secondly,a Greedy Proximal Policy Optimization(Greedy-PPO)intelligent spectrum sharing decision algorithm is proposed,which decouples the discrete continuous action space and uses the PPo method to optimize the allocation of transmission power.Then,the Greedy method is employed to solve the problem of spectrum discrete optimization allocation and obtain an approximately optimal joint spectrum sharing strategy.Finally,through simulation experiments,it is verified that the Greedy PPO algorithm can improve the overall performance by 48%and 15%compared to greedy algorithms and DDQN algorithms,respectively,demonstrating excellent performance of spectrum utilization.
关 键 词:频谱共享 强化学习 规则算法 决策管理 混合动作空间
分 类 号:V243[航空宇航科学与技术—飞行器设计]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.21.34.100