BEV-Locator:an end-to-end visual semantic localization network using multi-view images

作　　者：Zhihuang ZHANG Meng XU Wenqiang ZHOU Tao PENG Liang LI Stefan POSLAD

机构地区：[1]School of Information Technology&Management,University of International Business and Economics,Beijing 100029,China [2]School of Vehicle and Mobility,Tsinghua University,Beijing 100084,China [3]Qcraft Inc.,Beijing 100054,China [4]School of Electronic Engineering and Computer Science,Queen Mary University of London,London E14NS,UK

出　　处：《Science China(Information Sciences)》2025年第2期130-146,共17页中国科学(信息科学)(英文版)

基　　金：supported by Beijing Higher Education Society under the 2024 General Project Scheme(Grant No.MS2024128);funding from the Ningbo Philosophy and Social Science Planning Project,as part of the“Ningbo Development Blue Book 2025”Initiative(Grant No.GL24-16)。

摘　　要：Accurate localization ability is fundamental in autonomous driving.Traditional visual localization frameworks approach the semantic map-matching problem with geometric models,which rely on complex parameter tuning and thus hinder large-scale deployment.In this paper,we propose BEV-Locator:an end-to-end visual semantic localization neural network using multi-view camera images.Specifically,a visual BEV(bird-eye-view)encoder extracts and flattens the multi-view images into BEV space.While the semantic map features are structurally embedded as map query sequences.Then a cross-model transformer associates the BEV features and semantic map queries.The localization information of ego-car is recursively queried out by cross-attention modules.Finally,the ego pose can be inferred by decoding the transformer outputs.This end-to-end model speaks to its broad applicability across different driving environments,including high-speed scenarios.We evaluate the proposed method in large-scale nuScenes and Qcraft datasets.The experimental results show that the BEV-Locator is capable of estimating the vehicle poses under versatile scenarios,which effectively associates the cross-model information from multi-view images and global semantic maps.The experiments report satisfactory accuracy with mean absolute errors of 0.052 m,0.135 m and 0.251°in lateral,longitudinal translation and heading angle degree.

关键词：visual localization semantic map bird-eye-view TRANSFORMER pose estimation

分类号：TP391.41[自动化与计算机技术—计算机应用技术] U463.6[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

BEV-Locator:an end-to-end visual semantic localization network using multi-view images

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

BEV-Locator:an end-to-end visual semantic localization network using multi-view images

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索