图像分割与多尺度注意力Transformer结合的真实视图三维重建  

3D reconstruction of real views combined with image segmentation and multi-scale attention Transformer

作  者:郝森烜 肖易寒 HAO Senxuan;XIAO Yihan(Key Laboratory of Advanced Marine Communication and Information Technology,Ministry of Industry and Information Technology,Harbin Engineering University,Harbin 150001,China)

机构地区:[1]哈尔滨工程大学先进船舶通信与信息技术工业和信息化部重点实验室,黑龙江哈尔滨150001

出  处:《应用科技》2025年第1期189-197,共9页Applied Science and Technology

摘  要:为了解决在真实视图上三维重建效果不佳的问题,提出图像分割与多尺度注意力Transformer结合的真实视图三维重建方法。该方法分为原始图像分割和三维重建2部分,首先从多视角真实视图中用改进的DeepLabv3+模型分割出目标图像,然后送入引入多尺度注意力的Transformer模型输出重建结果。图像分割部分将原DeepLabv3+模型的主干网络换成优化的MobileNetv2网络以降低模型参数量。三维重建部分首先把由粗到细的多尺度注意力机制引入Transformer网络来聚合全局到局部的特征;再使用引入多尺度立方体注意力机制的细化器修正体素模型,提高重建精度。在ShapeNet数据集和真实视图数据集上进行验证,实验结果表明此方法可以提高真实视图三维重建的速度和精度,且优于多个重建模型。In order to solve the problem of poor 3D reconstruction effect in real views,this paper proposes a real view 3D reconstruction solution combining image segmentation and multi-scale attention Transformer.The method is specifically divided into two main components:original image segmentation and 3D reconstruction.The target image is segmented from the multiple real views using the improved DeepLabv3+model.Subsequently,the segmented image is fed into an enhanced Transformer model trained on composite views to generate 3D voxel reconstruction outputs.In the image segmentation phase,the backbone of the former DeepLabv3+model is replaced by optimized MobileNetv2 network to reduce complexity of the model parameters.In the 3D reconstruction phase,the coarse to fine multi-scale attention mechanism is introduced into Transformer network to aggregate global to local features.Additionally,the refiner introducing multi-scale cubic attention mechanism is used to correct the voxel model to improve the reconstruction accuracy.The experimental results on ShapeNet dataset and real-view dataset show that this method can improve the speed and accuracy of real-view 3D reconstruction,and is superior to multiple reconstruction models.

关 键 词:真实视图 三维重建 体素模型 Transformer模型 注意力机制 图像分割 DeepLabv3+模型 ShapeNet数据集 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象