UFormer:基于Transformer和U-Net结构的端到端特征点景象匹配算法  

UFormer:An End-to-End Feature Point Scene Matching Algorithm Based on Transformer and U-Net

在线阅读下载全文

作  者:辛瑞 张霄力 彭侠夫 陈锦文 XIN Rui;ZHANG Xiaoli;PENG Xiafu;CHEN Jinwen(School of Aerospace Engineering,Xiamen University,Xiamen,Fujian 361005,China)

机构地区:[1]厦门大学航空航天学院,福建厦门361005

出  处:《计算机科学》2023年第S02期334-339,共6页Computer Science

基  金:航空科学基金(201958068002)。

摘  要:目前景象匹配算法多采用传统特征点匹配算法,算法流程由特征检测和特征匹配组成,对于弱纹理场景精度低,匹配成功率低。UFormer提出了一种端到端的方案,用于完成基于Transformer的特征提取和匹配操作,采用注意力机制提高算法应对弱纹理场景的能力。受U-Net架构的启发,UFormer在编码器-解码器结构的基础上由粗到细构建了图像亚像素级的映射关系。编码器采用self-cross attention交叠结构检测并提取图像对各尺度的相关特征,建立特征连接,进行下采样,用于粗粒度的匹配,提供初始位置。解码器上采样,恢复图像分辨率,融合每个尺度上的注意力特征映射,实现细粒度层面的匹配,并通过期望的方式将匹配结果细化至亚像素精度。引入真值单应性矩阵计算粗、细粒度匹配点对坐标的欧氏距离反馈损失,监督网络的学习。UFormer融合特征检测与特征匹配,结构更简单,在保证准确性的同时提高了实时性,在一定程度上具备应对弱纹理场景的能力。在收集的无人机飞行轨迹数据集上,相比SIFT,坐标精度提升了0.183个像素,匹配耗时缩短至0.106 s,对弱纹理场景图像的匹配成功率更高。At present,most scene matching algorithms use traditional feature point matching algorithms.The algorithm process consists of feature detection and feature matching.For weak texture scenes,both the accuracy and matching success rate are low.UFormer proposes an end-to-end solution to complete Transformer-based feature extraction and matching operations,and uses an attention mechanism to improve the algorithm’s ability to deal with weak texture scenes.Inspired by the U-Net architecture,UFormer constructs the sub-pixel-level mapping relationship of images from coarse to fine based on the encoder-decoder structure.The encoder uses the self-cross attention overlapping structure to detect and extract the relevant features of each scale of the image,establish feature connections,and perform down-sampling for coarse-grained matching to provide the initial position.The decoder upsamples,restores image resolution,fuses attentional feature maps at each scale,achieves matching at a fine-grained level,and refines the matching results to sub-pixel precision in a desired way.Introduce the ground-truth homography matrix to calculate the Euclidean distance feedback loss of coarse and fine-grained matching point-to-coordinates,and supervise the learning of the network.UFormer integrates feature detection and feature matching,with a simpler structure,which improves real-time performance while ensuring accuracy,and has the ability to deal with weak texture scenes to a certain extent.On the collected drone trajectory data set,compared with SIFT,the coordinate accuracy improves by 0.416 pixels,the matching time decreases to 0.106 s,and the matching success rate for weak texture scene images is higher.

关 键 词:景象匹配 注意力机制 视觉定位 深度学习 

分 类 号:TN967.2[电子电信—信号与信息处理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象