检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:严瑞东 高洪波 Yan Ruidong;Gao Hongbo(School of Traffic and Transportation,Beijing Jiaotong University,Beijing,100091,China;Department of Automation,University of Science and Technology of China,Hefei,230026,China)
机构地区:[1]北京交通大学交通运输学院,北京100091 [2]中国科技大学自动化系,安徽合肥230026
出 处:《中国现代教育装备》2023年第11期126-128,共3页China Modern Educational Equipment
摘 要:分析强化学习课程教学中存在的问题,将“路径寻优”案例引入课程教学,探索以一个案例串联强化学习核心算法的教学方法。首先,基于“路径寻优”案例构建马尔可夫决策过程模型;其次,阐述动态规划方法、蒙特卡洛方法、时序差分方法的原理及区别;最后,结合“路径寻优”案例,通过关联编程,更加直观地讲解强化学习核心算法的区别,提高强化学习课程的教学质量。We analyze the problems in the teaching of reinforcement learning course.By introducing the case of"path optimization"into practical teaching,the teaching method of explaining the key algorithm of reinforcement learning with the same case is explored.Firstly,a type of Markov decision process model based on the case of"path optimization"is built.Then,the principle and difference of dynamic programming,monte carlo and temporal difference methods are described.Finally,by using the case of"path optimization",the theoretical differences of the key algorithm of reinforcement learning are explained more intuitively through association programming,thus improving the quality of reinforcement learning course teaching.
分 类 号:G642.0[文化科学—高等教育学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.68.176