检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈松 沈苏彬[2] CHEN Song;SHEN Su-bin(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;National Engineering Research Center on Communication and Networking,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
机构地区:[1]南京邮电大学计算机学院,江苏南京210023 [2]南京邮电大学通信与网络技术国家工程研究中心,江苏南京210003
出 处:《计算机技术与发展》2025年第2期115-121,共7页Computer Technology and Development
基 金:国家重大基础研究计划(973)项目子课题(2011CB302903);江苏省产学研联合创新资金项目(BY2013095108)。
摘 要:提升Q学习(Q-learning)算法在复杂环境中的数据效率与决策准确度,无疑是算法性能优化所面临的关键挑战。将因果模型引入Q学习算法,通过揭示变量间的因果关系,从而提高Q学习算法的性能是新兴且热门的研究方向。该文提出一种基于因果模型的Q学习算法,C-Q学习(Causal-model based Q-learning)算法。该算法包括基于智能体利用Q学习算法与环境交互过程中关键变量之间的因果关系,构建结构因果模型;采用因果推断理论中的后门调整的方法去除模型中影响奖励的混淆因子所引起的混淆效应,评估了更为准确的Q值,并且精准识别出每个状态下可能获得最高奖励的动作,优化Q学习算法的动作选择过程。最后,将Q学习算法、Eva-Q学习算法、C-Q学习算法在栅格环境中进行仿真实验。仿真实验结果表明,C-Q学习算法在路径长度、规划时间、数据效率和决策准确度等多个指标上均优于其余两种算法。Improving the data efficiency and decision accuracy of Q-learning algorithms in complex environments is undoubtedly a key challenge for algorithm performance optimization.Introducing causal model into Q-learning algorithms and improving the performance of Q-learning algorithms by revealing the causal relationship between variables is an emerging and popular research direction.We propose a Q-learning algorithm based on a causal model,the C-Q learning(Causal-model based Q-learning)algorithm.The algorithm includes building a structural causal model based on the causal relationship between key variables in the process of robot using Q-learning algorithm and environment interaction;using the backdoor adjustment method in causal inference theory to remove the confusion effect caused by the confounding factor affecting the reward in the model,evaluating a more accurate Q value,and accurately identifying the action that may obtain the highest reward in each state,optimizing the action selection process of the Q-learning algorithm.Finally,the Q-learning algorithm,Eva-Q learning algorithm,and C-Q learning algorithm were simulated in a grid environment.The simulation results show that the C-Q learning algorithm is superior to the other two algorithms in multiple indicators such as path length,planning time,data efficiency,and decision accuracy.
关 键 词:Q学习算法 因果模型 因果推断 混淆因子 后门调整
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.218.232.140