IQAGPT:computed tomography image quality assessment with vision-language and ChatGPT models  

在线阅读下载全文

作  者:Zhihao Chen Bin Hu Chuang Niu Tao Chen Yuxin Li Hongming Shan Ge Wang 

机构地区:[1]Institute of Science and Technology for Brain-Inspired Intelligence,Fudan University,Shanghai 200433,China [2]Department of Radiology,Huashan Hospital,Fudan University,Shanghai 200040,China [3]Biomedical Imaging Center,Center for Biotechnology and Interdisciplinary Studies,Department of Biomedical Engineering,Rensselaer Polytechnic Institute,Troy,NY 12180,US [4]MOE Frontiers Center for Brain Science,Fudan University,Shanghai 200032,China [5]Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence(Ministry of Education),Fudan University,Shanghai 200433,China

出  处:《Visual Computing for Industry,Biomedicine,and Art》2024年第1期165-181,共17页工医艺的可视计算(英文)

基  金:supported in part by the National Natural Science Foundation of China,No.62101136;Shanghai Sailing Program,No.21YF1402800;National Institutes of Health,Nos.R01CA237267,R01HL151561,R01EB031102,and R01EB032716.

摘  要:Large language models(LLMs),such as ChatGPT,have demonstrated impressive capabilities in various tasks and attracted increasing interest as a natural language interface across many domains.Recently,large vision-language models(VLMs)that learn rich vision–language correlation from image–text pairs,like BLIP-2 and GPT-4,have been intensively investigated.However,despite these developments,the application of LLMs and VLMs in image quality assessment(IQA),particularly in medical imaging,remains unexplored.This is valuable for objective performance evaluation and potential supplement or even replacement of radiologists’opinions.To this end,this study intro-duces IQAGPT,an innovative computed tomography(CT)IQA system that integrates image-quality captioning VLM with ChatGPT to generate quality scores and textual reports.First,a CT-IQA dataset comprising 1,000 CT slices with diverse quality levels is professionally annotated and compiled for training and evaluation.To better leverage the capabilities of LLMs,the annotated quality scores are converted into semantically rich text descriptions using a prompt template.Second,the image-quality captioning VLM is fine-tuned on the CT-IQA dataset to generate qual-ity descriptions.The captioning model fuses image and text features through cross-modal attention.Third,based on the quality descriptions,users verbally request ChatGPT to rate image-quality scores or produce radiological qual-ity reports.Results demonstrate the feasibility of assessing image quality using LLMs.The proposed IQAGPT outper-formed GPT-4 and CLIP-IQA,as well as multitask classification and regression models that solely rely on images.

关 键 词:Deep learning Medical imaging Image captioning MULTIMODALITY Large language model Vision-language model GPT-4 Subjective evaluation 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象