ViTH:面向医学图像检索的视觉Transformer哈希改进算法  

ViTH:Improved Vision Transformer Hashing Algorithm for Medical Image Retrieval

在线阅读下载全文

作  者:刘传升 丁卫平[1] 程纯[1] 黄嘉爽 王海鹏 LIU Chuansheng;DING Weiping;CHENG Chun;HUANG Jiashuang;WANG Haipeng(School of Information Science and Technology,Nantong University,Nantong Jiangsu 226019,China)

机构地区:[1]南通大学信息科学技术学院,江苏南通226019

出  处:《西南大学学报(自然科学版)》2024年第5期11-26,共16页Journal of Southwest University(Natural Science Edition)

基  金:国家自然科学基金项目(61976120,62102199);教育部人文社会科学研究青年基金项目(21YJCZH013);江苏省自然科学基金项目(BK20231337);江苏省高等学校自然科学研究重大项目(21KJA510004);江苏省研究生科研与实践创新计划项目(SJCX22_1615).

摘  要:对海量的医学图像进行有效检索会给医学诊断和治疗带来极其重要的意义.哈希方法是图像检索领域中的一种主流方法,但在医学图像领域的应用相对较少.针对此,提出一种面向医学图像检索的视觉Transformer哈希改进算法.首先使用视觉Transformer模型作为基础的特征提取模块,其次在Transformer编码器的前、后端分别加入幂均值变换(Power-Mean Transformation,PMT),进一步增强模型的非线性性能,接着在Transformer编码器内部的多头注意力(Multi-Head Attention,MHA)层引入空间金字塔池化(Spatial Pyramid Pooling,SPP)形成多头空间金字塔池化注意力(Multi-Head Spatial Pyramid Pooling Attention,MHSPA)模块,该模块不仅可以提取全局的上下文特征,而且可以提取多尺度的局部上下文特征,并将不同尺度的特征进行融合.最后在输出幂均值变换层之后将提取到的特征分别通过两个多层感知机(Multi-Layer Perceptrons,MLPs),上分支的MLP用来预测图像的类别,下分支的MLP用来学习图像的哈希码.在损失函数部分,充分考虑了成对损失、量化损失、平衡损失以及分类损失来优化整个模型.在医学图像数据集ChestX-ray14和ISIC 2018上的实验结果表明,该研究所提出的算法相比于经典的哈希算法具有更好的检索效果.Effective retrieval of huge number of medical images will bring extremely important significance to medical diagnosis and treatment.Hashing method is a mainstream method in the field of image retrieval,but the application in the field of medical images is relatively small.For this,an improved Vision Transformer Hashing algorithm for medical image retrieval is proposed.Firstly,the Vision Transformer model is used as the base feature extraction module;secondly,the Power-Mean Transform(PMT)is added to the front and back ends of the Transformer encoder respectively to further enhance the nonlinear performance of the model;and then the Spatial Pyramid Pooling(SPP)is introduced into the Multi-Head Attention(MHA)layer inside the Transformer encoder to form the Multi-Head Spatial Pyramid Pooling Attention(MHSPA)module,which not only extracts global contextual features,but also extracts multi-scale local contextual features and fuses features of different scales;finally,after outputting the Power-Mean Transformation layer,the extracted features are passed through two Multi-Layer Perceptrons(MLPs)respectively,and the MLP in the upper branch is used to predict the category of the image and the MLP in the lower branch is used to learn the hashing codes of the images.In the loss function part,pairwise loss,quantization loss,balanced loss,and classification loss are fully considered to optimize the whole model.Experimental results on the medical image dataset ChestX-ray14 and ISIC 2018 show that the proposed algorithm in this paper has better retrieval results compared to the classical hashing algorithm.

关 键 词:医学图像检索 视觉Transformer 哈希 幂均值变换 空间金字塔池化 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象