一种改进的自适应近端策略优化算法

An Adaptive Control Approach for Proximal Policy Optimization

作　　者：王慧[1] 李虹[1] 何秋生[1] 李占龙[2] WANG Hui;LI Hong;HE Qiu-sheng;LI Zhan-long(Taiyuan University of Science and Technology,School of Electronic and Information Engineering,Taiyuan Shanxi 030024,China;Taiyuan University of Science and Technology,School of Vehicle and Traffic Engineering,Taiyuan Shanxi 030024,China)

机构地区：[1]太原科技大学电子信息工程学院,山西太原030024 [2]太原科技大学车辆与交通工程学院,山西太原030024

出　　处：《计算机仿真》2025年第3期404-409,436,共7页Computer Simulation

基　　金：国家自然科学基金项目(52272401)。

摘　　要：针对传统的近端策略优化(PPO)惩罚算法在训练过程中存在收敛性不好的问题,提出一种改进的PPO惩罚算法。通过将基于常量自适应更新惩罚系数的方法改为基于函数自适应更新的方法,使惩罚系数与散度相关联,并随着散度的变化以一定的趋势发生改变,从而改善算法的收敛性和学习的可靠性,上述方法使得算法更加灵活且适应性更强。经仿真验证,改进的PPO惩罚算法在收敛性和学习可靠性方面优于传统的PPO惩罚算法,并使用分布式PPO算法进一步验证了改进方法的有效性,为后续强化学习算法的研究提供了新的思路和方法。An improved Proximal Policy Optimization(PPO)penalty algorithm is proposed to address the issue of poor convergence in the training process of traditional Proximal Policy Optimization penalty algorithms.By changing the method of updating penalty coefficients based on constant adaptation to the method of function adaptation,the penalty coefficients are associated with divergence and change in a certain trend with the variation of divergence,thereby improving the convergence and learning reliability of the algorithm.This method makes the algorithm more flexible and adaptable.The simulation results show that the improved Proximal Policy Optimization penalty algorithm is superior to the traditional Proximal Policy Optimization penalty algorithm in terms of convergence and learning reliability.The distributed Proximal Policy Optimization algorithm is used to further verify the effectiveness of the improved method,which provides a new idea and method for the subsequent research of Reinforcement learning algorithms.

关键词：强化学习近端策略优化自适应惩罚系数

分类号：TP301.6[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进的自适应近端策略优化算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进的自适应近端策略优化算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索