检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:段世红[1,2] 何昊 徐诚 殷楠[1] 王然 DUAN Shi-hong;HE Hao;XU Cheng;YIN Nan;WANG Ran(School of Computer and Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China;Shunde Graduate School,University of Science and Technology Beijing,Foshan,Guangdong 528399,China)
机构地区:[1]北京科技大学计算机与通信工程学院,北京100083 [2]北京科技大学顺德研究生院,广东佛山528399
出 处:《电子学报》2022年第7期1744-1752,共9页Acta Electronica Sinica
基 金:国家自然科学基金(No.62101029);博士后创新人才支持计划(No.BX20190033);广东省基础与应用基础研究基金联合基金(No.2019A1515110325);中国博士后基金面上项目(No.2020M670135);北京科技大学顺德研究生院博士后科研经费(No.2020BH001);中央高校基本科研业务费(No.06500127)。
摘 要:信源导航在应急救援、工业巡检及其他危险作业中具有重要应用意义.在实际应用中,环境的状态信息往往是难以完全观测的,即部分可观测环境.如何利用观测到的部分环境信息做出实时决策,并基于历史序列信息对系统未来状态进行有效的预测,成为信源导航相关研究所面临的挑战性问题.本文提出一种基于深度序列蒙特卡洛树搜索(Deep Sequential Monte-Carlo Tree Search,DS-MCTS)的信源导航算法和系统框架,基于序列动作预测(Sequential Action Prediction,SAP)网络为MCTS决策提供先验知识,构建奖励分配预测(Reward Allocation Prediction,RAP)网络提高奖励分配精度,最终实现系统的最优化决策.仿真实验表明,DS-MCTS方法提供了一种端到端的信源导航解决方案,可以实现智能体动作的有效预测,实现高效、鲁棒的路径规划.Source navigation has important application significance in emergency rescue,industrial patrol,and other dangerous operations.In practical applications,it is often difficult to fully observe the state information of the environment,that is,a partially observable environment.Making real-time decisions using part of the observed environmental information and effectively predicting the system’s future state based on the historical sequence information have become a challenge faced by research institutes related to source navigation.This paper proposes a source navigation algorithm and system framework based on deep sequential Monte-Carlo tree search(DS-MCTS).Prior knowledge is provided to MCTS decision-making based on a sequential action prediction(SAP)network.A reward allocation prediction(RAP)network is built to improve the accuracy of reward distribution and finally realize the system’s optimal decision-making.The simulation results show that the DS-MCTS method provides an end-to-end source navigation solution,which can effectively predict agents’actions and achieve efficient and robust path planning.
关 键 词:信源导航 蒙特卡洛树搜索 序贯决策 路径规划 深度强化学习
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.148