检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Zhihao Chen Bin Hu Chuang Niu Tao Chen Yuxin Li Hongming Shan Ge Wang
机构地区:[1]Institute of Science and Technology for Brain-Inspired Intelligence,Fudan University,Shanghai 200433,China [2]Department of Radiology,Huashan Hospital,Fudan University,Shanghai 200040,China [3]Biomedical Imaging Center,Center for Biotechnology and Interdisciplinary Studies,Department of Biomedical Engineering,Rensselaer Polytechnic Institute,Troy,NY 12180,US [4]MOE Frontiers Center for Brain Science,Fudan University,Shanghai 200032,China [5]Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence(Ministry of Education),Fudan University,Shanghai 200433,China
出 处:《Visual Computing for Industry,Biomedicine,and Art》2024年第1期165-181,共17页工医艺的可视计算(英文)
基 金:supported in part by the National Natural Science Foundation of China,No.62101136;Shanghai Sailing Program,No.21YF1402800;National Institutes of Health,Nos.R01CA237267,R01HL151561,R01EB031102,and R01EB032716.
摘 要:Large language models(LLMs),such as ChatGPT,have demonstrated impressive capabilities in various tasks and attracted increasing interest as a natural language interface across many domains.Recently,large vision-language models(VLMs)that learn rich vision–language correlation from image–text pairs,like BLIP-2 and GPT-4,have been intensively investigated.However,despite these developments,the application of LLMs and VLMs in image quality assessment(IQA),particularly in medical imaging,remains unexplored.This is valuable for objective performance evaluation and potential supplement or even replacement of radiologists’opinions.To this end,this study intro-duces IQAGPT,an innovative computed tomography(CT)IQA system that integrates image-quality captioning VLM with ChatGPT to generate quality scores and textual reports.First,a CT-IQA dataset comprising 1,000 CT slices with diverse quality levels is professionally annotated and compiled for training and evaluation.To better leverage the capabilities of LLMs,the annotated quality scores are converted into semantically rich text descriptions using a prompt template.Second,the image-quality captioning VLM is fine-tuned on the CT-IQA dataset to generate qual-ity descriptions.The captioning model fuses image and text features through cross-modal attention.Third,based on the quality descriptions,users verbally request ChatGPT to rate image-quality scores or produce radiological qual-ity reports.Results demonstrate the feasibility of assessing image quality using LLMs.The proposed IQAGPT outper-formed GPT-4 and CLIP-IQA,as well as multitask classification and regression models that solely rely on images.
关 键 词:Deep learning Medical imaging Image captioning MULTIMODALITY Large language model Vision-language model GPT-4 Subjective evaluation
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43