Assessing the possibility of using large language models in ocular surface diseases  

在线阅读下载全文

作  者:Qian Ling Zi-Song Xu Yan-Mei Zeng Qi Hong Xian-Zhe Qian Jin-Yu Hu Chong-Gang Pei Hong Wei Jie Zou Cheng Chen Xiao-Yu Wang Xu Chen Zhen-Kai Wu Yi Shao 

机构地区:[1]Department of Ophthalmology,the First Affiliated Hospital,Jiangxi Medical College,Nanchang University,Nanchang 330006,Jiangxi Province,China [2]Ophthalmology Centre of Maastricht University,Maastricht 6200MS,Limburg,Netherlands [3]Changde Hospital,Xiangya School of Medicine,Central South University(the First People’s Hospital of Changde City),Changde 415000,Hunan Province,China [4]Department of Ophthalmology,Shanghai General Hospital,Shanghai Jiao Tong University School of Medicine,National Clinical Research Center for Eye Diseases,Shanghai 200080,China

出  处:《International Journal of Ophthalmology(English edition)》2025年第1期1-8,共8页国际眼科杂志(英文版)

基  金:Supported by National Natural Science Foundation of China(No.82160195,No.82460203);Degree and Postgraduate Education Teaching Reform Project of Jiangxi Province(No.JXYJG-2020-026).

摘  要:AIM:To assess the possibility of using different large language models(LLMs)in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases:ChatGPT-4,ChatGPT-3.5,Claude 2,PaLM2,and SenseNova.METHODS:A group of experienced ophthalmology professors were asked to develop a 100-question singlechoice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions.The exam includes questions on the following topics:keratitis disease(20 questions),keratoconus,keratomalaciac,corneal dystrophy,corneal degeneration,erosive corneal ulcers,and corneal lesions associated with systemic diseases(20 questions),conjunctivitis disease(20 questions),trachoma,pterygoid and conjunctival tumor diseases(20 questions),and dry eye disease(20 questions).Then the total score of each LLMs and compared their mean score,mean correlation,variance,and confidence were calculated.RESULTS:GPT-4 exhibited the highest performance in terms of LLMs.Comparing the average scores of the LLMs group with the four human groups,chief physician,attending physician,regular trainee,and graduate student,it was found that except for ChatGPT-4,the total score of the rest of the LLMs is lower than that of the graduate student group,which had the lowest score in the human group.Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers,giving very little chance of an incorrect answer.ChatGPT-4 showed higher credibility when answering questions,with a success rate of 59%,but gave the wrong answer to the question 28% of the time.CONCLUSION:GPT-4 model exhibits excellent performance in both answer relevance and confidence.PaLM2 shows a positive correlation(up to 0.8)in terms of answer accuracy during the exam.In terms of answer confidence,PaLM2 is second only to GPT4 and surpasses Claude 2,SenseNova,and GPT-3.5.Despite the fact that ocular surface disease is a highly specializ

关 键 词:ChatGPT-4.0 ChatGPT-3.5 large language models ocular surface diseases 

分 类 号:R779.6[医药卫生—眼科]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象