基于文本引导下的多模态医学图像分析算法  被引量:1

A Multi-Modal Medical Image Analysis Algorithm Based on Text Guidance

在线阅读下载全文

作  者:樊琳 龚勋[1,2,3,4] 郑岑洋[1,2,3,4] FAN Lin;GONG Xun;ZHENG Cen-yang(School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu,Sichuan 611756,China;Engineering Research Center of Sustainable Urban Intelligent Transportation,Ministry of Education,Chengdu,Sichuan 611756,China;National Engineering Laboratory of Integrated Transportation Big Data Application Technology,Chengdu,Sichuan 611756,China;Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province,Chengdu,Sichuan 611756,China)

机构地区:[1]西南交通大学计算机与人工智能学院,四川成都611756 [2]可持续城市交通智能化教育部工程研究中心,四川成都611756 [3]综合交通大数据应用技术国家工程实验室,四川成都611756 [4]四川省制造业产业链协同与信息化支撑技术重点实验室,四川成都611756

出  处:《电子学报》2024年第7期2341-2355,共15页Acta Electronica Sinica

基  金:国家自然科学基金(No.62376231);四川省重点研发项目(No.2023YFG0267);四川省卫生健康委员会科技项目(No.23LCYJ022)~~。

摘  要:结合胃镜超声和白光内镜可以更准确地识别胃肠道间质瘤.但是现有的多模态方法往往仅关注于图像特征,忽略了诊断文本信息中所包含的语义信息对于精确理解和诊断医学图像的重要性.为此,本文提出一种新的基于文本引导下的多模态医学图像分析算法框架(Text-guided Multi-modal Medical image analysis framework,TMM-Net).TMM-Net使用多阶段的诊断文本来引导模型学习,以提取图像中的关键诊断信息特征,然后通过交叉模态注意力机制促进多模态特征之间的交互.值得注意的是,TMM-Net通过预测病变属性来模拟临床诊断过程,从而增强了可解释性.验证实验在两个中心包含10 025个模态数据对的数据集上进行.结果表明,该方法相比目前最优的GISTs诊断方法精度提升7.7%,同时获得了最高的(Area Under the Curve,AUC)值:0.927,其可解释性可以更好地适合临床需求.Combining gastroscopy ultrasound and white light endoscopy can improve the accuracy of identifying gas⁃trointestinal stromal tumors(GISTs).However,existing multi-modal methods often focus solely on image features and over⁃look the semantic relevance contained in diagnostic textual information,which is crucial for precise understanding and diag⁃nosis of medical images.To address this issue,we propose a novel text-guided multi-modal medical image analysis frame⁃work(TMM-Net).TMM-Net extracts key diagnostic information features from images through a multi-stage guided model of diagnostic text,and then promotes the interaction of multi-modal features through cross-modal attention mechanisms.Nota⁃bly,TMM-Net simulates the clinical diagnostic process by predicting lesion attributes,enhancing interpretability.Validation experiments were conducted on a dataset consisting of 10025 modality data pairs from two centers.The results show that the proposed method achieves a 7.7%improvement in accuracy compared to the current state-of-the-art GISTs diagnostic meth⁃od,with the highest AUC(Area Under the Curve)value of 0.927,and its interpretability may better suit clinical needs.

关 键 词:多模态融合 模型可解释性 图像-文本匹配 胃肠道间质瘤 胃镜超声 白光内镜 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象