Transformer与多尺度注意力的自监督单目图像深度估计  被引量:2

Self-supervised Monocular Image Depth Estimation Primed by Transformer and Multi-scale Attention Scheme

在线阅读下载全文

作  者:梁水波 刘紫燕 孙昊堃 袁浩 梁静 LIANG Shui-bo;LIU Zi-yan;SUN Hao-kun;YUAN Hao;LIANG Jing(College of Big Data and Information Engineering,Guizhou University,Guiyang 550025,China)

机构地区:[1]贵州大学大数据与信息工程学院,贵阳550025

出  处:《小型微型计算机系统》2023年第4期825-831,共7页Journal of Chinese Computer Systems

基  金:贵州省科学技术基金项目(黔科合基础[2016]1054)资助;贵州省联合资金项目(黔科合LH字[2017]7226号)资助;贵州大学2017年度学术新苗培养及创新探索专项项目(黔科合平台人才[2017]5788)资助;贵州省科技计划项目(黔科合SY字[2011]3111)资助。

摘  要:针对现有自监督学习的单目图像深度估计在分辨率较大情况下存在边缘模糊、物体轮廓不清晰等问题,本文提出一种结合视觉Transformer的多尺度通道注意力融合单目图像深度估计网络.首先,设计编码器-解码器模型,将视觉Transformer结构作为编码器在多个尺度上提取特征.其次,设计残差通道注意力融合的解码器,优化提取到的多尺度特征并实现上下级特征融合以提高上下文信息的利用率.最后,在多个尺度下对单目图像进行深度估计.本文提出的算法在KITTI数据集上进行实验.实验结果表明,所提出算法的深度图像质量和物体轮廓信息均高于现有算法,其绝对相对误差、平方相对误差和均方根误差分别达到了0.119、0.857和4.571,在不同阈值下的准确度达到了0.959、0.995和0.999,验证了所提算法的正确性和有效性.Aiming at the problems of high-resolution images blurring in edges and contours in current selfsupervised monocular image depth estimation,a monocular image depth estimation network combining visual Transformer and multi-scale channel attention scheme is proposed.Firstly,an Encoder-Decoder model is designed,in which the multi-scale feature is extracted by using visual Transformer-Encoder.Secondly,the Residual Channel Attention(RCA)Decoder is designed for optimizing the extracted multi-scale features in detail and merging the features at the upper and lower levels to improve the usability of contextual information.Finally,monocular image depth estimation is performed at multiple scales.The proposed method achieves better performance of higher-quality image depth and clearer contour on KITTI than that of current models.The absolute relative error,squared relative error,and root mean square error of the algorithm are 0.119,0.857 and 4.571,respectively.And the accuracy reaches 0.959,0.995 and 0.999 at different thresholds.The experimental results demonstrate the feasibility and effectiveness of the proposed algorithm.

关 键 词:深度学习 单目图像深度估计 TRANSFORMER 自监督学习 通道注意力 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象