基于后继强化学习的智能小车导航策略的迁移

Migration of mobile robot navigation strategy based on successor reinforcement learning

作　　者：钱浩何军[1,2] 胡昭华[1,3] QIAN Hao;HE Jun;HU Zhaohua(School of Electronics and Information Engineering,Nanjing University of Information Science and Technology,Nanjing 210044,China;School of Artificial Intelligence,Nanjing University of Information Science and Technology,Nanjing 210044,China;Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology,Nanjing University of Information Science and Technology,Nanjing 210044,China)

机构地区：[1]南京信息工程大学电子与信息工程学院,江苏南京210044 [2]南京信息工程大学人工智能学院,江苏南京210044 [3]南京信息工程大学江苏省大气环境与装备技术协同创新中心,江苏南京210044

出　　处：《现代电子技术》2022年第6期169-174,共6页Modern Electronics Technique

基　　金：国家自然科学基金资助项目(61601230)。

摘　　要：针对目前智能小车导航策略在环境迁移中需要花费大量时间重新训练的问题,文中提出一种基于深度强化学习的智能小车导航策略。该策略使用后继强化学习作为智能小车的决策框架,结合特征映射,使智能小车可以将先前环境中学习的导航策略迁移到新的环境中。首先在初始环境中建立后继强化学习的控制模型,在模型的特征提取输出端加入特征映射网络,使模型可以将新环境的特征映射到旧环境之中,将智能小车在环境中提取的图像信息作为输入状态训练模型。然后将该模型迁移到新的环境之中进行训练,通过特征映射在新环境中复用旧环境的策略,从而减少在环境迁移中的训练时间。最后在仿真环境下进行训练并验证。实验结果表明,所提方法可以在自主完成导航任务的同时减少训练时间,且与传统的强化学习方法相比,在环境迁移的过程中能更快适应新的环境。In allusion to the problem that the current intelligent car navigation strategy needs to spend a lot of time retraining in environment migration,an intelligent car navigation strategy based on deep reinforcement learning is proposed.In this strategy,the deep reinforcement learning is used as the decision⁃making framework of the mobile robot and combined with feature mapping,so that the mobile robot can transfer the navigation strategy learned in the previous environment to the new environment.The successor reinforcement learning control model is established in the initial environment,and the feature mapping network is added into the feature extraction output terminal of the model,so that the model can map the features of the new environment to the old environment,and the image information extracted by the smart car in the environment is used as the input state training model.The model is migrated to the new environment for training,and the strategy of the old environment is reused in the new environment by means of the feature mapping,thereby reducing the training time in the environment migration.The training and verification in simulation environment are conducted.The experimental results show that the proposed method can reduce the training time while completing the navigation task independently,and can adapt to the new environment faster in the process of environment migration than the traditional reinforcement learning method.

关键词：强化学习自主导航智能小车路径规划环境迁移后继特征深度学习端到端决策

分类号：TN915-34[电子电信—通信与信息系统] TP391[电子电信—信息与通信工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于后继强化学习的智能小车导航策略的迁移

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于后继强化学习的智能小车导航策略的迁移

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索