检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:闫婧昕
机构地区:[1]北京建筑大学理学院,北京
出 处:《统计学与应用》2025年第2期115-125,共11页Statistical and Application
摘 要:医学视觉问答(Medical VQA)通过回答基于医学图像的自然语言问题,为临床诊断和决策提供支持。然而,现有方法在多步推理、细粒度理解和可解释性方面存在不足。本文提出一种创新性模型,通过子问题生成机制将复杂医学查询分解为简单问题,并结合多模态对齐和动态知识注入模块逐步推理。模型能够精准聚焦医学图像的关键区域,对查询相关的语义进行动态整合,提升答案生成的准确性和可靠性。在SLAKE和VQA-MED数据集上进行的实验表明,所提方法在答案准确性、推理能力和可解释性方面优于现有方法,为医学VQA任务中的多模态信息整合和复杂推理提供了高效解决方案,并为临床诊断和智能医学研究提供了新思路。Medical Visual Question Answering (Medical VQA) supports clinical diagnosis and decision-making by answering natural language questions based on medical images. However, existing approaches face challenges in multi-step reasoning, fine-grained understanding, and interpretability. This paper proposes an innovative model that decomposes complex medical queries into simpler sub-questions through a sub-question generation mechanism. Combined with multimodal alignment and dynamic knowledge injection modules, the model performs progressive reasoning. It dynamically focuses on key regions of medical images, integrates query-relevant semantics, and enhances the accuracy and reliability of answer generation. Experiments conducted on the SLAKE and VQA-MED datasets demonstrate that the proposed method outperforms state-of-the-art approaches in terms of answer accuracy, reasoning capability, and interpretability. This work offers an efficient solution for multimodal information integration and complex reasoning in Medical VQA tasks and provides new insights for clinical diagnostics and intelligent medical research.
关 键 词:医学视觉问答 子问题生成 多模态对齐 动态知识注入 逐步推理
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.144.25.212