融合注意力机制和多层U-Net的多视图立体重建  被引量:9

Fusion attention mechanism and multilayer U-Net for multiview stereo

在线阅读下载全文

作  者:刘会杰 柏正尧[1] 程威 李俊杰[1] 许祝 Liu Huijie;Bai Zhengyao;Cheng Wei;Li Junjie;Xu Zhu(School of Information Science and Engineering,Yunnan University,Kunming650500,China)

机构地区:[1]云南大学信息学院,昆明650500

出  处:《中国图象图形学报》2022年第2期475-485,共11页Journal of Image and Graphics

基  金:云南省重大科技专项计划项目(202002AD080001)。

摘  要:目的针对多视图立体(multi-view stereo,MVS)重建效果整体性不理想的问题,本文对MVS 3D重建中的特征提取模块和代价体正则化模块进行研究,提出一种基于注意力机制的端到端深度学习架构。方法首先从输入的源图像和参考图像中提取深度特征,在每一级特征提取模块中均加入注意力层,以捕获深度推理任务的远程依赖关系;然后通过可微分单应性变换构建参考视锥的特征量,并构建代价体;最后利用多层U-Net体系结构正则化代价体,并通过回归结合参考图像边缘信息生成最终的细化深度图。结果在DTU(Technical University of Denmark)数据集上进行测试,与现有的几种方法相比,本文方法相较于Colmap、Gipuma和Tola方法,整体性指标分别提高8.5%、13.1%和31.9%,完整性指标分别提高20.7%、41.6%和73.3%;相较于Camp、Furu和Surface Net方法,整体性指标分别提高24.8%、33%和29.8%,准确性指标分别提高39.8%、17.6%和1.3%,完整性指标分别提高9.7%、48.4%和58.3%;相较于Pru Mvsnet方法,整体性指标提高1.7%,准确性指标提高5.8%;相较于Mvsnet方法,整体性指标提高1.5%,完整性标提高7%。结论在DTU数据集上的测试结果表明,本文提出的网络架构在整体性指标上得到了目前最优的结果,完整性和准确性指标得到较大提升,3D重建质量更好。ObjectiveWith the rapid development of deep learning,multi-view stereo(MVS)research based on learning has also made great progress.The goal of MVS is to reconstruct a highly detailed scene or object under the premise that a series of images and corresponding camera poses and inherent parameters(internal and external parameters of the camera)are known as the 3 D geometric model.As a branch of computer vision,it has achieved tremendous development in recent decades and is widely used in many aspects,such as autonomous driving,robot navigation,and remote sensing.Learningbased methods can incorporate global semantic information such as specular reflection and reflection priors to achieve more reliable matching.If the receiving field of convolutional neural network(CNN)is large enough,it can better reconstruct poor texture areas.The existing learning-based MVS reconstruction methods mainly include three categories:voxel-based,point cloud-based,and depth map-based.The voxel-based method divides the 3 D space into a regular grid and estimates whether each voxel is attached to the surface.The point cloud-based method runs directly on the point cloud,usually relying on the propagation strategy to make the reconstruction more dense gradually.The depth map method uses the estimated depth map as an intermediate layer to decompose the complex MVS problem into relatively small depth estimation problems per view,only focuses on one reference image and several source images at a time,and then performs regression(fusion)on each depth map to form the final 3 D point cloud model.Despite room for improvement in the series of reconstruction methods proposed before,the latest MVS benchmark tests(such as Technical University of Denmark(DTU))have proven that using depth maps as an intermediate layer can achieve more accurate 3 D model reconstruction.Several end-to-end neural networks are proposed to predict the depth of the scene directly from a series of input images(for example,MVSNet and R-MVSNet).Even though the accuracy of these me

关 键 词:注意力机制 多层U-Net 可微分单应性变换 代价体正则化 多视图立体(MVS) 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象