基于无模型强化学习的雷达波束多阶段管理方法  

Model-free Reinforcement Learning Based Radar Beam Multi-stage Management Method

在线阅读下载全文

作  者:马智杰 王远航 姜家财 张天贤 MA Zhijie;WANG Yuanhang;JIANG Jiacai;ZHANG Tianxian(School of Information and Communication Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China;No.10 Research Institute of China Electronics Technology Group Corporation,Chengdu 610036,China)

机构地区:[1]电子科技大学信息与通信工程学院,成都611731 [2]中国电子科技集团公司第十研究所,成都610036

出  处:《现代雷达》2022年第11期44-50,共7页Modern Radar

基  金:国家自然科学基金资助项目(61971109);国防科技创新特区支持项目(重点项目);中央高校基本科研业务费资助项目(ZYGX2018J009)。

摘  要:火控雷达(FCR)工作时常常面临转发式干扰的挑战,考虑二者间多阶段对抗场景,针对未知环境模型下雷达波束多阶段管理问题,提出了一种基于无模型强化学习的波束驻留时间优化方法。首先,建立了未知环境模型下的马尔可夫决策过程,用于多阶段波束驻留时间优化,为了评价雷达探测的性能,以FCR对目标锁定时间的期望为评价标准;然后,为克服未知环境模型的挑战,提出了一种面向多阶段波束驻留时间优化的强化学习框架,并在此基础上提出了一种基于Q学习的驻留时间优化方法;最后,通过数值仿真验证了该方法的有效性。Repeater jamming is often the challenge faced by fire control radar(FCR). Considering the multi-stage confrontation scenario, a dwell time optimization method based on model-free reinforcement learning is proposed to handle the problem of radar beam multi-stage management with unknown environment model. Firstly, a Markov decision process with unknown environment model is built for multi-stage dwell time optimization. To evaluate the performance of radar detection, the expectation of the search to lock-on time of the FCR is selected as an evaluation criterion. Then, to overcome the challenge of the unknown environment model, a reinforcement learning framework for multi-stage dwell time optimization is formulated. According to the framework, a method of multi-stage dwell time optimization based on Q-learning is proposed. Finally, numerical results are provided to verify the validity of the proposed method.

关 键 词:雷达波束管理 多阶段驻留时间优化 未知环境模型 Q学习 

分 类 号:TN972[电子电信—信号与信息处理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象