一种融合视觉Transformer和扩散模型的单视点内窥镜手术光场重建方法(特邀)

Single-View Endoscopic Surgical Light Field Reconstruction Combining Vision Transformer and Diffusion Model(Invited)

作　　者：韩晨明吴高昌 Han Chenming;Wu Gaochang(State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819,Liaoning,China)

机构地区：[1]东北大学流程工业综合自动化国家重点实验室,辽宁沈阳110819

出　　处：《激光与光电子学进展》2024年第16期183-193,共11页Laser & Optoelectronics Progress

基　　金：国家自然科学基金(62103092,61991404);教育部中央高校基础研究基金项目(N2108001,N2424004);辽宁省辽河实验室研究计划(LLL23ZZ-05-01)。

摘　　要：针对内窥镜手术中单一视角图像深度估计不确定性与遮挡导致的手术场景三维感知难题,提出了一种融合视觉Transformer和条件扩散模型的单视点多平面图(MPI)表征方法,用以进行内窥镜手术光场重建。该方法首先利用视觉Transformer将输入的单视角图像令牌化,从而分解为多个图像块,并通过多头注意力机制提取局部与全局相结合的关联特征。然后,利用多尺度卷积解码器将图像块特征从粗到细进行重组与融合,生成初始MPI。最后,为了解决单视点内窥镜手术中组织之间的遮挡问题,引入了一个基于条件扩散模型的背景预测模块,根据初始MPI获取遮挡掩模,并以遮挡掩模和输入视角作为条件,预测被遮挡区域的分布,有效解决了单视点输入引起的光场内视角不连贯的问题。所提方法将基于视觉Transformer所分解的初始MPI与基于条件扩散模型预测的背景区域相结合,得到优化后的MPI,从而渲染出内窥镜手术光场中的各子视点图像。在达芬奇手术机器人的真实内窥镜手术数据集上的实验验证表明,所提方法在视觉和客观评价指标上均优于现有的单视图光场重建方法。To address the issues associated with 3D perception in endoscopic surgery,such as uncertainty in depth estimation and occlusions from a single-view image,this paper proposes a novel single-view multi-plane image(MPI)representation-based method.This method uses a fusion of a vision transformer and a conditional diffusion model designed for light field reconstruction in endoscopic operations.Initially,the method employs a vision transformer to tokenize the single-view input image,decomposing it into multiple image patches and extracting locally and globally associative features through a multi-head attention mechanism.Then,the image block features are reassembled and fused from coarse to fine using a multi-scale convolutional decoder to generate an initial MPI.Finally,to address the occlusion problem between tissues in single-view endoscopic surgery,a background prediction module based on a conditional diffusion model is introduced.This module uses the initial MPI to obtain an occlusion mask,and conditioned on this mask and the input viewpoint,it predicts the distribution of the occluded areas.This approach effectively addresses the problem of incoherent viewing angles in the light field caused by single-view input.The proposed method combines the initial MPI,decomposed by the vision transformer,with the background area predicted by the diffusion model to produce an optimized MPI,thus rendering the sub-view images within the endoscopic surgical light field.Experiment results on a real endoscopic surgical dataset from the Da Vinci surgical robot demonstrate that the proposed method outperforms existing single-view light field reconstruction methods in terms of both visual and objective evaluation metrics.

关键词：光场重建视觉Transformer 多平面图像表示条件扩散模型

分类号：O436[机械工程—光学工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种融合视觉Transformer和扩散模型的单视点内窥镜手术光场重建方法(特邀)

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种融合视觉Transformer和扩散模型的单视点内窥镜手术光场重建方法(特邀)

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索