检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:代珊珊 刘全[1,2,3,4] DAI Shan-shan;LIU Quan(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China;Provincial Key Laboratory for Computer Information Processing Technology,Soochow University,Suzhou,Jiangsu 215006,China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China;Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210000,China)
机构地区:[1]苏州大学计算机科学与技术学院,江苏苏州215006 [2]苏州大学江苏省计算机信息处理技术重点实验室,江苏苏州215006 [3]吉林大学符号计算与知识工程教育部重点实验室,长春130012 [4]软件新技术与产业化协同创新中心,南京210000
出 处:《计算机科学》2021年第9期235-243,共9页Computer Science
基 金:国家自然科学基金(61772355,61702055,61502323,61502329);江苏省高等学校自然科学研究重大项目(18KJA520011,17KJA520004);吉林大学符号计算与知识工程教育部重点实验室资助项目(93K172014K04,93K172017K18);苏州市应用基础研究计划工业部分(SYG201422);江苏高校优势学科建设工程资助项目。
摘 要:随着人工智能的发展,自动驾驶领域的研究也日益壮大。深度强化学习(Deep Reinforcement Learning,DRL)方法是该领域的主要研究方法之一。其中,安全探索问题是该领域的一个研究热点。然而,大部分DRL算法为了提高样本的覆盖率并没有对探索方法进行安全限制,使无人车探索时会陷入某些危险状态,从而导致学习失败。针对该问题,提出了一种基于动作约束的软行动者-评论家算法(Constrained Soft Actor-critic,CSAC),该方法首先对环境奖赏进行了合理限制。无人车动作转角过大时会产生抖动,因此在奖赏函数中加入惩罚项,使无人车尽量避免陷入危险状态。另外,CSAC方法又对智能体的动作进行了约束。当目前状态选择动作后使无人车偏离轨道或者发生碰撞时,标记该动作为约束动作,在之后的训练中通过合理约束来更好地指导无人车选择新动作。为了体现CSAC方法的优势,将CSAC方法应用在自动驾驶车道保持任务中,并与SAC算法进行对比。结果表明,引入安全机制的CSAC方法可以有效避开不安全动作,提高自动驾驶过程中的稳定性,同时还加快了模型的训练速度。最后,将训练好的模型移植到带有树莓派的无人车上,进一步验证了模型的泛用性。With the development of artificial intelligence,the field of autonomous driving is also growing.The deep reinforcement learning(DRL)method is one of the main research methods in this field.DRL algorithms have been reported to achieve excellent performance in many control tasks.However,the unconstrained exploration in the learning process of DRL usually restricts its application to automatic driving.For example,in common reinforcement learning(RL)algorithms,an agent often has to select an action to execute in each state although this action may result in a crash,deteriorating the performance,or even failing the task.To solve the problem,this paper proposes a new method of action constrained with the soft actor-critic algorithm(CSAC)where the‘NO-OP’(NO-Option)identifies and replaces inappropriate actions,and we test the algorithm in the lane-keeping tasks.The method firstly limits the environmental reward reasonably.When the rotation angle of the driverless car is too large,it will shake,then a penalty term will be added to the reward function to avoid the driverless car falling into a dangerous state as far as possible.The contributions of this paper are as follows:first,we incorporates action constrained function with SAC algorithm,which achieves faster learning speed and higher stability;second,we propose a reward setting framework that overcomes the shaking and instability of driverless cars,achieving a better performance;finally,we trains the model in the unity virtual environment for evaluating the performance and successfully transplant the model to a donkey driverless car.
关 键 词:安全自动驾驶 深度强化学习 软行动者-评论家 车道保持 无人车
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.116.237.222