检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高振海[1] 闫相同 高菲[1] Gao Zhenhai;Yan Xiangtong;Gao Fei(Jilin University,State Key Laboratory of Automotive Simulation and Control,Changchun 130022)
机构地区:[1]吉林大学,汽车仿真与控制国家重点实验室,长春130022
出 处:《汽车工程》2022年第7期969-975,共7页Automotive Engineering
基 金:国家重点研发计划项目(2017YFB0102601);国家自然科学基金(51775236,U1564214)资助。
摘 要:基于人类驾驶员数据获得自动驾驶决策策略是当前自动驾驶技术研究的热点。经典的强化学习决策方法大多通过设计安全性、舒适性、经济性相关公式人为构建奖励函数,决策策略与人类驾驶员相比仍然存在较大差距。本文中使用最大边际逆向强化学习算法,将驾驶员驾驶数据作为专家演示数据,建立相应的奖励函数,并实现仿驾驶员的纵向自动驾驶决策。仿真测试结果表明:相比于强化学习方法,逆向强化学习方法的奖励函数从驾驶员的数据中自动化的提取,降低了奖励函数的建立难度,得到的决策策略与驾驶员的行为具有更高的一致性。Obtaining autonomous driving decision-making strategies by using human driver data is a hot spot in current research on autonomous driving technology.Most of the classic reinforcement learning decision-making methods artificially construct reward functions by designing formulas related to safety,comfort,and economy,which leads to a big gap between decision-making strategies and human drivers.This paper uses the maximum margin inverse reinforcement learning algorithm.Taking the driver’s driving data as expert demonstration data,a reward function is established,and the longitudinal autonomous driving decision-making by imitating the driver is realized.The simulation test results show that compared with the reinforcement learning method,the reward function of the inverse reinforcement learning method is automatically extracted from the driver's data,which reduces the difficulty of establishing the reward function,and the obtained decision-making strategy has higher consistency with the driver’s behavior.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.116.230.250