未知环境下基于深度序列蒙特卡罗树搜索的信源导航方法被引量：2

DS-MCTS:A Deep Sequential Monte-Carlo Tree Search Method for Source Navigation in Unknown Environments

作　　者：段世红[1,2] 何昊徐诚殷楠[1] 王然 DUAN Shi-hong;HE Hao;XU Cheng;YIN Nan;WANG Ran(School of Computer and Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China;Shunde Graduate School,University of Science and Technology Beijing,Foshan,Guangdong 528399,China)

机构地区：[1]北京科技大学计算机与通信工程学院,北京100083 [2]北京科技大学顺德研究生院,广东佛山528399

出　　处：《电子学报》2022年第7期1744-1752,共9页Acta Electronica Sinica

基　　金：国家自然科学基金(No.62101029);博士后创新人才支持计划(No.BX20190033);广东省基础与应用基础研究基金联合基金(No.2019A1515110325);中国博士后基金面上项目(No.2020M670135);北京科技大学顺德研究生院博士后科研经费(No.2020BH001);中央高校基本科研业务费(No.06500127)。

摘　　要：信源导航在应急救援、工业巡检及其他危险作业中具有重要应用意义.在实际应用中,环境的状态信息往往是难以完全观测的,即部分可观测环境.如何利用观测到的部分环境信息做出实时决策,并基于历史序列信息对系统未来状态进行有效的预测,成为信源导航相关研究所面临的挑战性问题.本文提出一种基于深度序列蒙特卡洛树搜索(Deep Sequential Monte-Carlo Tree Search,DS-MCTS)的信源导航算法和系统框架,基于序列动作预测(Sequential Action Prediction,SAP)网络为MCTS决策提供先验知识,构建奖励分配预测(Reward Allocation Prediction,RAP)网络提高奖励分配精度,最终实现系统的最优化决策.仿真实验表明,DS-MCTS方法提供了一种端到端的信源导航解决方案,可以实现智能体动作的有效预测,实现高效、鲁棒的路径规划.Source navigation has important application significance in emergency rescue,industrial patrol,and other dangerous operations.In practical applications,it is often difficult to fully observe the state information of the environment,that is,a partially observable environment.Making real-time decisions using part of the observed environmental information and effectively predicting the system’s future state based on the historical sequence information have become a challenge faced by research institutes related to source navigation.This paper proposes a source navigation algorithm and system framework based on deep sequential Monte-Carlo tree search(DS-MCTS).Prior knowledge is provided to MCTS decision-making based on a sequential action prediction(SAP)network.A reward allocation prediction(RAP)network is built to improve the accuracy of reward distribution and finally realize the system’s optimal decision-making.The simulation results show that the DS-MCTS method provides an end-to-end source navigation solution,which can effectively predict agents’actions and achieve efficient and robust path planning.

关键词：信源导航蒙特卡洛树搜索序贯决策路径规划深度强化学习

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

未知环境下基于深度序列蒙特卡罗树搜索的信源导航方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

未知环境下基于深度序列蒙特卡罗树搜索的信源导航方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

未知环境下基于深度序列蒙特卡罗树搜索的信源导航方法被引量：2