自适应奖励函数的PPO曲面覆盖方法

Surface Coverage Method Based on PPO with Adaptive Reward Function

作　　者：李淑怡阳波[2] 陈灵沈玲唐文胜 LI Shuyi;YANG Bo;CHEN Ling;SHEN Ling;TANG Wensheng(College of Information Science and Engineering,Hunan Normal University,Changsha 410081,Hunan,China;College of Engineering and Design,Hunan Normal University,Changsha 410081,Hunan,China)

机构地区：[1]湖南师范大学信息科学与工程学院,湖南长沙410081 [2]湖南师范大学工程与设计学院,湖南长沙410081

出　　处：《计算机工程》2025年第3期86-94,共9页Computer Engineering

基　　金：国家自然科学基金青年项目(62203167)。

摘　　要：针对机器人清洁作业过程中现有曲面覆盖方法难以适应曲面变化且覆盖效率低的问题,提出一种自适应奖励函数的近端策略优化(PPO)曲面覆盖方法(SC-SRPPO)。首先,将目标曲面离散化,以球查询方式获得协方差矩阵,求解点云的法向量,建立3D曲面模型;其次,以曲面局部点云的覆盖状态特征和曲率变化特征作为曲面模型观测值以构建状态模型,有利于机器人移动轨迹拟合曲面,提高机器人对曲面变化的适应能力;接着,基于曲面的全局覆盖率和与时间相关的指数模型构建一种自适应奖励函数,引导机器人向未覆盖区域移动,提高覆盖效率;最后,将曲面局部状态模型、奖励函数、PPO强化学习算法相融合,训练机器人完成曲面覆盖路径规划任务。在球形、马鞍形、立体心形等3种曲面模型上,以点云覆盖率与覆盖完成时间作为主要评价指标进行实验,结果表明,SC-SRPPO的平均覆盖率为90.72%,与NSGA Ⅱ、PPO、SAC这3种方法对比,覆盖率分别提升4.98%、14.56%、27.11%,覆盖完成时间分别缩短15.20%、67.18%、62.64%。SC-SRPPO能够在适应曲面变化的基础上使机器人更加高效地完成曲面覆盖任务。Existing surface coverage methods are difficult to adapt to surface changes,and their coverage efficiency in robot cleaning operations is low.This paper proposes a surface coverage method based on Proximal Policy Optimization(PPO),namely SC-SRPPO,with an adaptive reward function.First,the target surface is discretized and the covariance matrix is obtained via spherical query to solve the normal vector of the point cloud,which is then used to establish the 3D surface model.Second,a state model is constructed using the coverage state and curvature change features of the surface local point cloud as the observation value of the surface model,which guides the robot to fit the surface during movement and improves the adaptability of the robot to the surface.Subsequently,based on the global coverage of the surface and the time-related exponential model,an adaptive reward function is constructed to guide the robot to move to the uncovered area as soon as possible and improve coverage efficiency.Finally,the local state model and reward function of the surface are combined with the PPO algorithm to train the robot to complete surface coverage path planning.The average coverage rate on the sphere of SC-SRPPO was 90.72%for the hyperboloid and heart models.Comparing the NSGA Ⅱ,PPO,and SAC,the coverage rate increased by 4.98%,14.56%,and 27.11%,respectively,while the coverage completion time was reduced by 15.20%,67.18%,and 62.64%,respectively.The results show that SC-SRPPO can make the robot complete the surface-covering task more efficiently than NSGA Ⅱ and SAC by adapting to surface changes.

关键词：清洁机器人曲面覆盖路径规划强化学习近端策略优化

分类号：TP242.3[自动化与计算机技术—检测技术与自动化装置]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

自适应奖励函数的PPO曲面覆盖方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

自适应奖励函数的PPO曲面覆盖方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索