扩展目标跟踪中基于深度强化学习的传感器管理方法  

Sensor Management Method Based on Deep Reinforcement Learning in Extended Target Tracking

在线阅读下载全文

作  者:张虹芸 陈辉[1] 张文旭 ZHANG Hong-Yun;CHEN Hui;ZHANG Wen-Xu(School of Electrical Engineering and Information Engineering,Lanzhou University of Technology,Lanzhou 730050)

机构地区:[1]兰州理工大学电气工程与信息工程学院,兰州730050

出  处:《自动化学报》2024年第7期1417-1431,共15页Acta Automatica Sinica

基  金:国家自然科学基金(62163023,62366031,62363023,61873116);甘肃省教育厅产业支撑计划项目(2021CYZC-02);2024年度甘肃省重点人才项目资助。

摘  要:针对扩展目标跟踪(Extended target tracking,ETT)优化中的传感器管理问题,基于随机矩阵模型(Random matrices model,RMM)建模扩展目标,提出一种基于深度强化学习(Deep reinforcement learning,DRL)的传感器管理方法.首先,在部分可观测马尔科夫决策过程(Partially observed Markov decision process,POMDP)理论框架下,给出基于双延迟深度确定性策略梯度(Twin delayed deep deterministic policy gradient,TD3)算法的扩展目标跟踪传感器管理的基本方法;其次,利用高斯瓦瑟斯坦距离(Gaussian Wasserstein distance,GWD)求解扩展目标先验概率密度与后验概率密度之间的信息增益,对扩展目标多特征估计信息进行综合评价,进而以信息增益作为TD3算法奖励函数的构建;然后,通过推导出的奖励函数,进行基于深度强化学习的传感器管理方法的最优决策;最后,通过构造扩展目标跟踪优化仿真实验,验证了所提方法的有效性.To solve the problem of sensor management in the optimization of extended target tracking(ETT),this paper proposes a sensor management method based on deep reinforcement learning(DRL)by modeling the extended target based on random matrices model(RMM).First,in the theoretical framework of partially observed Markov decision process(POMDP),a elementary method of sensor management for extended target tracking based on twin delayed deep deterministic policy gradient(TD3)algorithm is presented.After that,the Gaussian Wasserstein distance(GWD)is used to calculate the information gain between the prior probability density and the posterior probability density of the extended target,which is used to comprehensively evaluate the multi-feature estimation information of the extended target,and then the information gain is used as the reward function of TD3 algorithm.Furthermore,the optimal sensor management scheme based on deep reinforcement learning is decided by the derived reward function.Finally,the effectiveness of the proposed algorithm is verified by constructing an extended target tracking optimization simulation experiment.

关 键 词:传感器管理 扩展目标跟踪 深度强化学习 双延迟深度确定性策略梯度 信息增益 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TP212[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象