用于精确图像分割的特征细化金字塔视觉转换器

FR-PVT:A feature-refined pyramid vision transformer for accurate image segmentation

作　　者：聂应旺王雷梅晨阳陈浩[1] NIE Yingwang;WANG Lei;MEI Chenyang;CHEN Hao(School of Ophthalmology&Optometry,Eye Hospital,Wenzhou Medical University,Wenzhou 325027,China)

机构地区：[1]温州医科大学眼视光学院(生物医学工程学院),浙江温州325027

出　　处：《温州医科大学学报》2024年第8期631-640,共10页Journal of Wenzhou Medical University

基　　金：国家自然科学基金项目(62006175)。

摘　　要：目的:准确提取用于形态评估和临床疾病监测的医学图像中的目标区域,改进将卷积神经网络(CNN)与转换器(Transformer)结合的混合网络用以学习图像局部信息和全局信息。方法:①通过引入基于CNN的解码器并将其与金字塔视觉转换器(PVT)整合,开发了一种新颖的特征细化分割网络称为特征细化金字塔视觉转换器(FR-PVT)。解码器用于细化PVT捕获的多尺度全局特征,由特征细化模块(FRM)和上下文注意模块(CAM)以及相似性聚合模块(SAM)共同构成。②为了验证FR-PVT,将其用于五个公共结肠镜图像数据集(ClinicDB、ColonDB、EndoScene、ETIS和KvasirSEG)的息肉分割和温州医科大学附属眼视光医院提供的眼部视频数据集的睑裂分割。③使用四种不同的指标评估FR-PVT的性能,包括Dice系数、IOU、Matthew系数(MCC)和Hausdorff距离(Hdf)。FR-PVT与现有网络[即息肉PVT(Polyp-PVT)、U-Net及其变体]在相同的分割任务上进行比较。结果:①FR-PVT能够处理各种成像条件下获取的结肠镜图像,并在分割ClinicDB、ColonDB、EndoScene、ETIS和KvasirSEG数据集时获得平均Dice分别为0.937、0.819、0.892、0.800和0.909。②在眼部视频数据集中的图像上进行的实验结果显示,FR-PVT获得的平均Dice、IOU、MCC和Hdf分别为0.966、0.943、0.957和4.706。③在五个息肉数据集上的分割性能对比显示,FR-PVT分别获得了平均Dice系数和IOU分别为0.840和0.764,优于Polyp-PVT(0.834和0.760)、U-Net(0.561和0.493)、UNet++(0.546和0.476)、SFA(0.476和0.367)、PraNet(0.741和0.675)。在眼部视频图像上的分割性能显示,FR-PVT分别获得了0.840的平均Dice系数和0.764的平均IOU。结论:FR-PVT实现了比Polyp-PVT和现有的几种基于CNN的网络(如U-Net及其变体)更好的分割性能。Objective:To accurately extract target regions in medical images used for morphological assessment and clinical disease monitoring,a hybrid network combining Convolutional Neural Network(CNN)and Transformer was explored to simultaneously learn local and global information in images.Methods:①A novel feature-refined segmentation network(referred to as FR-PVT)was developed by introducing a CNN-based decoder and integrating it with the pyramid vision transformer(PVT).The decoder was used to refine multi-scale global features captured by the PVT,consisting of the feature refinement module(FRM),context attention module(CAM),and similarity aggregation module(SAM).②To validate FR-PVT,it was used to segment polyps from five public colonoscopy image datasets(ClinicDB,ColonDB,EndoScene,ETIS,and KvasirSEG)and palpebral fissures from frame images in the eye videography dataset provided by the Eye Hospital of Wenzhou Medical University.③The performance of FR-PVT was evaluated by four different metrics,including Dice coefficient,IOU,Matthews correlation coefficient(MCC),and Hausdorff distance(Hdf).The same segmentation tasks were compared between FR-PVT and the networks available(Polyp-PVT,U-Net,and its multiple variants).Results:①The FR-PVT was able to handle colonoscopy images acquired under various imaging conditions and achieved average Dice coefficients of 0.937,0.819,0.892,0.800,and 0.909,respectively,for the five different testing subsets from ClinicDB, ColonDB, EndoScene, ETIS, and KvasirSEG datasets. ②Experimental results on frame images from the eye videography dataset showed that the FR-PVT obtainedaverage Dice, IOU, MCC, and Hdf of 0.966, 0.943, 0.957, and 4.706, respectively. ③The segmentation performance on five polyp datasets showed that the FR-PVT obtained average Dice and IOU of 0.840 and 0.764, outperforming Polyp-PVT (0.834 and 0.760), U-Net (0.561 and 0.493), U-Net++ (0.546 and 0.476), SFA (0.476 and 0.367), PraNet (0.741 and 0.675). Performance differences on frame images from the eye videograph

关键词：图像分割深度学习卷积块金字塔视觉转换器结肠息肉睑裂

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

用于精确图像分割的特征细化金字塔视觉转换器

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

用于精确图像分割的特征细化金字塔视觉转换器

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索