基于改进近端策略优化的无人艇自主避障方法  

Autonomous Obstacle Avoidance Method for Unmanned Surface Vehicles Based on Improved Proximal Policy Optimization

在线阅读下载全文

作  者:孔超 王维 皇苏斌 张义 孟丹 KONG Chao;WANG Wei;HUANG Subin;ZHANG Yi;MENG Dan(School of Computer and Information,Anhui Polytechnic University,Wuhu,Anhui 241000,China;Oppo Research Institute,Shenzhen,Guangdong 518000,China)

机构地区:[1]安徽工程大学计算机与信息学院,安徽芜湖241000 [2]OPPO研究院,广东深圳518000

出  处:《计算机科学》2025年第4期40-48,共9页Computer Science

基  金:安徽省高等学校科学研究项目(2023AH050914,2024AH052239);安徽省高等学校省级质量工程项目(2023zybj018);安徽省自然科学基金(2308085MF220);芜湖市科技计划项目(2023pt07,2023ly13);安徽工程大学本科教学质量提升计划项目(2022lzyybj02,2023jyxm15,2024jyxm76)。

摘  要:无人艇自主避障已成为其拓展应用场景的一项关键挑战。传统方法下无人艇避障主要依赖于对环境的精细建模,然而,复杂海洋环境下无人艇难以获取完整的感知状态,导致模型精度不足。针对上述问题,提出了一种改进近端策略优化的无人艇自主避障方法。首先,构建了基于马尔可夫决策过程的无人艇自主避障决策框架;然后,在近端策略优化算法中融合了循环神经网络的感知表征增强模块,提高无人艇对时序环境感知的记忆能力;最后,结合奖励重塑机制设计一套自主避障奖励函数,提升无人艇避障策略的优化速度。为了验证算法的有效性,在三维仿真平台下构建了典型无人艇自主避障算法的验证场景。实验结果表明,基于改进近端策略优化方法能够实现无人艇无碰撞自主航行,在模型收敛速度、碰撞率与超时率上均优于传统近端策略算法。Autonomous obstacle avoidance has become a critical challenge for expanding the application scenarios of unmanned surface vehicles(USVs).Traditional methods for USVs obstacle avoidance mainly rely on fine-grained environmental modeling.However,in complex marine environments,USVs have difficulty obtaining complete perception states,leading to insufficient model accuracy.To address this issue,we propose an improved proximal policy optimization(PPO)-based autonomous obstacle avoidance method for USVs.First,a perception and decision framework for USVs based on Markov decision process is constructed.Then,a feature-sharing representation optimization module is designed by fusing recurrent neural networks to enhance the USV’s memory ability for temporal environmental perception.Finally,an autonomous obstacle avoidance reward function is designed by combining reward reshaping mechanisms to improve the optimization speed of the USV obstacle avoidance strategy.To verify the effectiveness of the proposed algorithm,a typical USV autonomous obstacle avoidance algorithm verification scenario is constructed on a three-dimensional simulation platform.Experimental results show that the improved PPO-based method can achieve collision-free autonomous navigation for USVs and outperforms the traditional PPO algorithm in terms of model convergence speed,collision rate,and timeout rate.

关 键 词:无人艇 自主避障 近端策略优化 时序决策 奖励重塑 

分 类 号:U664.82[交通运输工程—船舶及航道工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象