BEV-Locator:an end-to-end visual semantic localization network using multi-view images  

在线阅读下载全文

作  者:Zhihuang ZHANG Meng XU Wenqiang ZHOU Tao PENG Liang LI Stefan POSLAD 

机构地区:[1]School of Information Technology&Management,University of International Business and Economics,Beijing 100029,China [2]School of Vehicle and Mobility,Tsinghua University,Beijing 100084,China [3]Qcraft Inc.,Beijing 100054,China [4]School of Electronic Engineering and Computer Science,Queen Mary University of London,London E14NS,UK

出  处:《Science China(Information Sciences)》2025年第2期130-146,共17页中国科学(信息科学)(英文版)

基  金:supported by Beijing Higher Education Society under the 2024 General Project Scheme(Grant No.MS2024128);funding from the Ningbo Philosophy and Social Science Planning Project,as part of the“Ningbo Development Blue Book 2025”Initiative(Grant No.GL24-16)。

摘  要:Accurate localization ability is fundamental in autonomous driving.Traditional visual localization frameworks approach the semantic map-matching problem with geometric models,which rely on complex parameter tuning and thus hinder large-scale deployment.In this paper,we propose BEV-Locator:an end-to-end visual semantic localization neural network using multi-view camera images.Specifically,a visual BEV(bird-eye-view)encoder extracts and flattens the multi-view images into BEV space.While the semantic map features are structurally embedded as map query sequences.Then a cross-model transformer associates the BEV features and semantic map queries.The localization information of ego-car is recursively queried out by cross-attention modules.Finally,the ego pose can be inferred by decoding the transformer outputs.This end-to-end model speaks to its broad applicability across different driving environments,including high-speed scenarios.We evaluate the proposed method in large-scale nuScenes and Qcraft datasets.The experimental results show that the BEV-Locator is capable of estimating the vehicle poses under versatile scenarios,which effectively associates the cross-model information from multi-view images and global semantic maps.The experiments report satisfactory accuracy with mean absolute errors of 0.052 m,0.135 m and 0.251°in lateral,longitudinal translation and heading angle degree.

关 键 词:visual localization semantic map bird-eye-view TRANSFORMER pose estimation 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] U463.6[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象