机构地区:[1]辽宁工程技术大学鄂尔多斯研究院,鄂尔多斯017000 [2]辽宁工程技术大学电子与信息工程学院,葫芦岛125105
出 处:《中国图象图形学报》2025年第3期824-841,共18页Journal of Image and Graphics
基 金:国家自然科学基金项目(61601213);辽宁工程技术大学鄂尔多斯研究院校地科技合作培育项目(YJY-XD-2023-003)。
摘 要:目的从单幅影像中估计景深已成为计算机视觉研究热点之一,现有方法常通过提高网络的复杂度回归深度,增加了数据的训练成本及时间复杂度,为此提出一种面向单目深度估计的多层次感知条件随机场模型。方法采用自适应混合金字塔特征融合策略,捕获图像中不同位置间的短距离和长距离依赖关系,从而有效聚合全局和局部上下文信息,实现信息的高效传递。引入条件随机场解码机制,以此精细捕捉像素间的空间依赖关系。结合动态缩放注意力机制增强对不同图像区域间依赖关系的感知能力,引入偏置学习单元模块避免网络陷入极端值问题,保证模型的稳定性。针对不同特征模态间的交互情况,通过层次感知适配器扩展特征映射维度增强空间和通道交互性能,提高模型的特征学习能力。结果在NYU Depth v2(New York University depth dataset version 2)数据集上进行消融实验,结果表明,本文网络可以显著提高性能指标,相较于对比的先进方法,绝对相对误差(absolute relative error,Abs Rel)减小至0.1以内,降低7.4%,均方根误差(root mean square error,RMSE)降低5.4%。为验证模型在真实道路环境中的实用性,在KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago)数据集上进行对比实验,上述指标均优于对比的主流方法,其中RMSE降低3.1%,阈值(δ<1.25^(2),δ<1.25^(3))准确度接近100%,此外,在MatterPort3D数据集上验证了模型的泛化性。从可视化实验结果看,在复杂环境下本文方法可以更好地估计困难区域的深度。结论本文采用多层次特征提取器及混合金字塔特征融合策略,优化了信息在编码器和解码器间的传递过程,通过全连接解码获取像素级别的输出,能够有效提高单目深度估计精度。Objective Predicting scene depth from a single RGB photograph is a complex and challenging issue.Accurate depth estimates are essential in various computer vision applications,including 3D reconstruction,autonomous driving,and robotics navigation.Accurately determining depth information from a two-dimensional image is a difficult task due to the ambiguity and absence of clear depth indicators.Modern approaches to this issue involve creating intricate neural networks that attempt to estimate depth maps in a direct and approximate way.These networks frequently utilize deep learning methods and large quantities of labeled data to understand the complex relationships between RGB pixels and their associated depth values.Although these methods have demonstrated promising outcomes,they frequently encounter issues such as computational inefficiency,overfitting,and poor generalization skills.This research introduces a multilevel perceptual conditional random field model that relies solely on the Swin Transformer.Method First,an adaptive hybrid pyramid feature fusion approach is a fundamental component of the entire architecture.This technique is precisely crafted to encompass various existing dependencies across multiple spatial positions,including short-distance and long-distance linkages.The proposed technique also efficiently gathers overall and specific contextual information by smoothly combining feature fusion techniques that include various kernel shapes,offering a thorough comprehension of the data.This consolidation not only guarantees the smooth transmission of information within the model but also considerably boosts the distinguishing capability of the feature representations.Therefore,the model becomes better at recognizing and understanding complex patterns and structures in the data,resulting in enhanced performance and accuracy.Second,the decoder includes dynamic scaling attention,a clever approach that markedly enhances the capacity of the model to capture complex dependency relationships among various re
关 键 词:单目深度估计 条件随机场 混合金字塔特征融合(HPF) 动态缩放注意力 层次感知适配器(HA)
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...