一种改进强化学习算法的路径规划方法

A Path Planning Method Based on Improved Reinforcement Learning Algorithm

作　　者：陈松沈苏彬[2] CHEN Song;SHEN Su-bin(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;National Engineering Research Center on Communication and Networking,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)

机构地区：[1]南京邮电大学计算机学院,江苏南京210023 [2]南京邮电大学通信与网络技术国家工程研究中心,江苏南京210003

出　　处：《计算机技术与发展》2025年第2期115-121,共7页Computer Technology and Development

基　　金：国家重大基础研究计划(973)项目子课题(2011CB302903);江苏省产学研联合创新资金项目(BY2013095108)。

摘　　要：提升Q学习(Q-learning)算法在复杂环境中的数据效率与决策准确度,无疑是算法性能优化所面临的关键挑战。将因果模型引入Q学习算法,通过揭示变量间的因果关系,从而提高Q学习算法的性能是新兴且热门的研究方向。该文提出一种基于因果模型的Q学习算法,C-Q学习(Causal-model based Q-learning)算法。该算法包括基于智能体利用Q学习算法与环境交互过程中关键变量之间的因果关系,构建结构因果模型;采用因果推断理论中的后门调整的方法去除模型中影响奖励的混淆因子所引起的混淆效应,评估了更为准确的Q值,并且精准识别出每个状态下可能获得最高奖励的动作,优化Q学习算法的动作选择过程。最后,将Q学习算法、Eva-Q学习算法、C-Q学习算法在栅格环境中进行仿真实验。仿真实验结果表明,C-Q学习算法在路径长度、规划时间、数据效率和决策准确度等多个指标上均优于其余两种算法。Improving the data efficiency and decision accuracy of Q-learning algorithms in complex environments is undoubtedly a key challenge for algorithm performance optimization.Introducing causal model into Q-learning algorithms and improving the performance of Q-learning algorithms by revealing the causal relationship between variables is an emerging and popular research direction.We propose a Q-learning algorithm based on a causal model,the C-Q learning(Causal-model based Q-learning)algorithm.The algorithm includes building a structural causal model based on the causal relationship between key variables in the process of robot using Q-learning algorithm and environment interaction;using the backdoor adjustment method in causal inference theory to remove the confusion effect caused by the confounding factor affecting the reward in the model,evaluating a more accurate Q value,and accurately identifying the action that may obtain the highest reward in each state,optimizing the action selection process of the Q-learning algorithm.Finally,the Q-learning algorithm,Eva-Q learning algorithm,and C-Q learning algorithm were simulated in a grid environment.The simulation results show that the C-Q learning algorithm is superior to the other two algorithms in multiple indicators such as path length,planning time,data efficiency,and decision accuracy.

关键词：Q学习算法因果模型因果推断混淆因子后门调整

分类号：TP301.6[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进强化学习算法的路径规划方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进强化学习算法的路径规划方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索