Sub-Med VQA:结合子问题生成与多模态推理的医学视觉问答  

Sub-Med VQA: A Medical Visual Question Answering Model Integrating Sub-Question Generation and Multimodal Reasoning

作  者:闫婧昕 

机构地区:[1]北京建筑大学理学院,北京

出  处:《统计学与应用》2025年第2期115-125,共11页Statistical and Application

摘  要:医学视觉问答(Medical VQA)通过回答基于医学图像的自然语言问题,为临床诊断和决策提供支持。然而,现有方法在多步推理、细粒度理解和可解释性方面存在不足。本文提出一种创新性模型,通过子问题生成机制将复杂医学查询分解为简单问题,并结合多模态对齐和动态知识注入模块逐步推理。模型能够精准聚焦医学图像的关键区域,对查询相关的语义进行动态整合,提升答案生成的准确性和可靠性。在SLAKE和VQA-MED数据集上进行的实验表明,所提方法在答案准确性、推理能力和可解释性方面优于现有方法,为医学VQA任务中的多模态信息整合和复杂推理提供了高效解决方案,并为临床诊断和智能医学研究提供了新思路。Medical Visual Question Answering (Medical VQA) supports clinical diagnosis and decision-making by answering natural language questions based on medical images. However, existing approaches face challenges in multi-step reasoning, fine-grained understanding, and interpretability. This paper proposes an innovative model that decomposes complex medical queries into simpler sub-questions through a sub-question generation mechanism. Combined with multimodal alignment and dynamic knowledge injection modules, the model performs progressive reasoning. It dynamically focuses on key regions of medical images, integrates query-relevant semantics, and enhances the accuracy and reliability of answer generation. Experiments conducted on the SLAKE and VQA-MED datasets demonstrate that the proposed method outperforms state-of-the-art approaches in terms of answer accuracy, reasoning capability, and interpretability. This work offers an efficient solution for multimodal information integration and complex reasoning in Medical VQA tasks and provides new insights for clinical diagnostics and intelligent medical research.

关 键 词:医学视觉问答 子问题生成 多模态对齐 动态知识注入 逐步推理 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象