检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:钱浩 何军[1,2] 胡昭华[1,3] QIAN Hao;HE Jun;HU Zhaohua(School of Electronics and Information Engineering,Nanjing University of Information Science and Technology,Nanjing 210044,China;School of Artificial Intelligence,Nanjing University of Information Science and Technology,Nanjing 210044,China;Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology,Nanjing University of Information Science and Technology,Nanjing 210044,China)
机构地区:[1]南京信息工程大学电子与信息工程学院,江苏南京210044 [2]南京信息工程大学人工智能学院,江苏南京210044 [3]南京信息工程大学江苏省大气环境与装备技术协同创新中心,江苏南京210044
出 处:《现代电子技术》2022年第6期169-174,共6页Modern Electronics Technique
基 金:国家自然科学基金资助项目(61601230)。
摘 要:针对目前智能小车导航策略在环境迁移中需要花费大量时间重新训练的问题,文中提出一种基于深度强化学习的智能小车导航策略。该策略使用后继强化学习作为智能小车的决策框架,结合特征映射,使智能小车可以将先前环境中学习的导航策略迁移到新的环境中。首先在初始环境中建立后继强化学习的控制模型,在模型的特征提取输出端加入特征映射网络,使模型可以将新环境的特征映射到旧环境之中,将智能小车在环境中提取的图像信息作为输入状态训练模型。然后将该模型迁移到新的环境之中进行训练,通过特征映射在新环境中复用旧环境的策略,从而减少在环境迁移中的训练时间。最后在仿真环境下进行训练并验证。实验结果表明,所提方法可以在自主完成导航任务的同时减少训练时间,且与传统的强化学习方法相比,在环境迁移的过程中能更快适应新的环境。In allusion to the problem that the current intelligent car navigation strategy needs to spend a lot of time retraining in environment migration,an intelligent car navigation strategy based on deep reinforcement learning is proposed.In this strategy,the deep reinforcement learning is used as the decision⁃making framework of the mobile robot and combined with feature mapping,so that the mobile robot can transfer the navigation strategy learned in the previous environment to the new environment.The successor reinforcement learning control model is established in the initial environment,and the feature mapping network is added into the feature extraction output terminal of the model,so that the model can map the features of the new environment to the old environment,and the image information extracted by the smart car in the environment is used as the input state training model.The model is migrated to the new environment for training,and the strategy of the old environment is reused in the new environment by means of the feature mapping,thereby reducing the training time in the environment migration.The training and verification in simulation environment are conducted.The experimental results show that the proposed method can reduce the training time while completing the navigation task independently,and can adapt to the new environment faster in the process of environment migration than the traditional reinforcement learning method.
关 键 词:强化学习 自主导航 智能小车 路径规划 环境迁移 后继特征 深度学习 端到端决策
分 类 号:TN915-34[电子电信—通信与信息系统] TP391[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7