检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:韩汀 陈思宇 马津[1] 蔡国榕[2] 张吴明 陈一平 HAN Ting;CHEN Siyu;MA Jin;CAI Guorong;ZHANG Wuming;CHEN Yiping(School of Geospatial Engineering and Science,Sun Yat-Sen University,Zhuhai 519082,China;School of Computer Engineering,Jimei University,Xiamen 361021,China)
机构地区:[1]中山大学测绘科学与技术学院,广东珠海519082 [2]集美大学计算机工程学院,福建厦门361021
出 处:《武汉大学学报(信息科学版)》2024年第4期582-594,共13页Geomatics and Information Science of Wuhan University
基 金:国家自然科学基金(42371343)。
摘 要:道路可行驶区域检测是汽车辅助驾驶系统中场景感知的关键基础。基于卷积神经网络的方法因难以获取全局上下文信息而易产生道路空洞和中断等完整性问题,而基于Transformer的方法缺乏局部理解,容易造成边界的错位和越界问题。为了克服上述两类方法的缺陷,提出了一种可学习深度位置编码引导的金字塔Transformer网络架构,融合卷积神经网络与Transformer进行道路可行驶区域检测。该框架建立金字塔Transformer主干网从全局感受野提取道路特征,并结合局部窗口注意力弥补细节损失,以收缩自注意力提升特征计算效率。针对Transformer中传统位置编码忽略像素与实际场景空间关联性的问题,提出用深度图像卷积特征构建可学习位置编码的方法,解决现实关联性脱节引起的注意力偏移和语义不对齐问题。在KITTI道路、Cityscapes与自建厦门市道路数据集上对该方法进行了测试和评估,结果表明,该方法在保证较高效率的同时,具有较高的稳定性和精确性,其最大F值在KITTI和Cityscapes数据集上分别达到97.53%和98.54%,优于目前KITTI道路基准测试的所有方法。此方法可为汽车驾驶辅助系统的路径规划与轨迹预测等任务提供高精度的语义先验信息。Objectives:The freespace detection is a crucial foundation for scene perception in advanced driver assistance systems.Convolutional neural network-based methods are unable to build global contextual in⁃fortmation that generate voids and interruptions in predicted results.At the same time,Transformer-based methods lack local understanding resulting in boundary misalignment and exceed.Methods:To this end,we propose a pyramid Transformer architecture with learnable deep position encoding for road freespace de⁃tection.First,the pyramid Transformer backbone is designed to extract road features from global perspec⁃tives.Second,local window attention is employed in dual-Transformer blocks to compensate for detail loss.Finally,to address the problem that traditional unlearnable position encoding ignores the spatial corre⁃lation between pixels and the real world,a learnable position encoding from deep convolutional features is constructed to solve the attention and semantic misalignment.Results:This model is tested and evaluated on KITTI road,Cityscapes,and Xiamen road datasets.The results show that our method achieves maxi⁃mum F measure of 97.53%and 98.54%in KITTI and Cityscapes,respectively.Conclusions:Our method outperforms existing algorithms in the KITTI road benchmark by ensuring higher efficiency while providing higher stability and accuracy.Meanwhile,our method provides high-precision semantic prior information for tasks such as path planning and trajectory prediction in automotive driving assistance systems.
关 键 词:TRANSFORMER 位置编码 道路感知 可行驶区域检测 自动驾驶
分 类 号:P208[天文地球—地图制图学与地理信息工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.17.156.98