面向复杂电磁干扰环境的Greedy-PPO智能频谱共享决策  

Greedy-PPO intelligent spectrum sharing decision for complex electromagnetic interference environments

在线阅读下载全文

作  者:殷凯杰 石嘉 段国栋 李立欣[3] 司江勃[1] YIN Kaijie;SHI Jia;DUAN Guodong;LI Lixin;SI Jiangbo(School of Telecommunications Engineering,Xidian University,Xi'an 710071,China;Southwest China Research Institute of Electronic Equipment,Chengdu 610036,China;School of Electronics and Information,Northwestern Polytechnical University,Xi'an 710129,China)

机构地区:[1]西安电子科技大学通信工程学院,西安710071 [2]中国电子科技集团公司第二十九研究所,成都610036 [3]西北工业大学电子信息学院,西安710129

出  处:《航空学报》2024年第22期233-246,共14页Acta Aeronautica et Astronautica Sinica

基  金:电磁空间作战与应用重点实验室基金(JJ2021-001)。

摘  要:针对复杂电磁环境下的多功能电磁设备用频激烈冲突问题,考虑连续和离散混合动作耦合决策挑战,研究基于强化学习的智能频谱共享技术。首先,考虑己方和干扰方用频规则等多方面因素影响,对复杂电磁干扰环境进行精细化建模,在此基础上,设计多任务需求下雷达通信一体化设备的频谱共享效能评估方法。其次,提出一种Greedy Proxi-mal Policy Optimization(Greedy-PPO)智能频谱共享决策算法,对离散-连续动作空间进行解耦,利用PPO方法最优配置传输功率,基于此,结合Greedy方法求解频谱离散优化分配问题,获得近似最优的联合频谱共享策略。最后,通过仿真实验验证,Greedy-PPO算法相比贪心算法和DDQN算法,总体效能指标可提升48%和15%,具有优良的频谱利用率表现。Considering the challenge of continuous and discrete hybrid action coupling decision-making,an intelligent spectrum sharing technology based on reinforcement learning is studied to solve the problem of intense frequency con-flict of multi-functional electromagnetic equipment in complex electromagnetic environment.Firstly,considering the influence of many factors such as the frequency rules of the own side and the jamming side,a sophisticated model of the complex electromagnetic interference environment is developed.Based on this,a spectrum sharing efficiency evaluation index for radar communication integrated equipment under multitask requirements is designed.Secondly,a Greedy Proximal Policy Optimization(Greedy-PPO)intelligent spectrum sharing decision algorithm is proposed,which decouples the discrete continuous action space and uses the PPo method to optimize the allocation of transmission power.Then,the Greedy method is employed to solve the problem of spectrum discrete optimization allocation and obtain an approximately optimal joint spectrum sharing strategy.Finally,through simulation experiments,it is verified that the Greedy PPO algorithm can improve the overall performance by 48%and 15%compared to greedy algorithms and DDQN algorithms,respectively,demonstrating excellent performance of spectrum utilization.

关 键 词:频谱共享 强化学习 规则算法 决策管理 混合动作空间 

分 类 号:V243[航空宇航科学与技术—飞行器设计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象