大语言模型应用于前牙美学修复中的可信性研究  

The trustworthiness of large language models in the application of anterior teeth aesthetic restoration

作  者:朱国慧 陈春霞 ZHU Guohui;CHEN Chunxia(Department of Prosthodontics,Tianjin Stomatological Hospital,School of Medicine,Nankai University,Tianjin Key Laboratory of Oral and Maxillofacial Function Reconstruction,China,300041)

机构地区:[1]天津市口腔医院口腔修复一科,南开大学医学院,天津市口腔功能重建重点实验室,300041

出  处:《实用口腔医学杂志》2025年第1期88-92,共5页Journal of Practical Stomatology

基  金:天津市自然科学基金(编号:22JCYBJC01240);中华医学会医学教育分会、全国医学教育发展中心医学教育研究课题项目(编号:2023B175);天津市口腔医院口腔修复学重点学科建设培育项目(编号:22XFZD6)。

摘  要:目的:评价“生成式人工智能技术——中文大语言模型”在前牙美学修复领域问题解答中的可信性,并探究如何通过相关人工智能技术提升已有模型在口腔专业问题解答中的可信度。方法:选取4个国内领先的中文大型语言模型——百川大模型3.0(A)、智谱清言GLM-4(B)、文心一言3.5(C)及通义千问(D),针对十个典型前牙美学修复问题进行了测试。通过权威资料、学术文献及专家意见确立标准答案,并对比模型回答的准确性。采用柱状图直观展示各模型在每个问题上的召回率和幻觉率,以便于性能比较。在与大语言模型交互时加入了思维链(CoT)技术,观察是否能够对模型回答前牙美学修复相关问题时的召回率和幻觉率产生积极影响。针对A和B两款模型,打开联网功能,观察检索增强生成(RAG)技术是否可改进模型的回答质量。结果:A~D组模型平均召回率分别为0.4167±0.13、0.3505±0.20、0.3587±0.01和0.5619±0.04,平均幻觉率分别为0.4651±0.04、0.6946±0.13、0.5018±0.08和0.3119±0.09。通过独立样本T检验对D组和A组进行了对比分析,结果显示D组在召回率和幻觉率上的优势较显著(t≈15.53,P<0.05)。引入CoT技术与模型交互,发现整体召回率有所提升,某些模型的幻觉率也出现了增长现象。当启用A模型和B模型的检索增强生成(RAG)功能时,显著提升了问题解答的召回率并降低了幻觉率(P<0.05)。结论:通义千问大语言模型所采用的方法或特性在提高答案准确性和减少不实信息方面显示出显著优势,在前牙美学修复问题解决上展现了更高的可信性。应用GoT虽能提升部分模型的正确率,也可能导致幻觉率上升。RAG策略能提高大语言模型的正确率,减少不实输出,增强模型在前牙美学修复领域的可靠性和实用性。Objective:To evaluate the trustworthiness of generative artificial intelligence technology of Chinese large language models(LLMs)in addressing issues related to anterior tooth aesthetic restoration and to explore the way to enhance the reliability of existing LLMs when answering questions in the field of oral health care through relevant artificial intelligence technologies.Methods:4 top-tier Chinese LLMs,BaiChuan 3.0(A),ZhiPu QingYan GLM-4(B),Wenxin Yiyuan 3.5(C),and QianWen(D)were used to analyze 10 items of anterior teeth aesthetic restoration.Standards were set using scholarly data and expert consensus,the model's recall and hallucination rates were compared.CoT technique was applied to gauge the effect on enhancing answer accuracy in dental queries.A and B models were tested for the effect of retrieval-augmented generation(RAG)in the improvement of their performance.Results:The recal rate of model A,B,C and D was 0.4167±0.13,0.3505±0.20,0.3587±0.01 and 0.5619±0.04 respectively,the hallucination rate was 0.4651±0.04,0.6946±0.13,0.5018±0.08 and 0.3119±0.09 respectively(between A and D groups,t≈15.53,P<0.05).After integrating Chain-of-Thought(CoT),overall recall improved but some models'hallucination rates rose.Applying RAG features in A and B significantly enhanced answer recall and reduced hallucination rates(P<0.05).Conclusion:The methods or features employed by the QianWen LLM demonstrated significant advantages in enhancing answer accuracy and reducing misinformation,thus showing higher credibility in addressing anterior aesthetic restoration issues.Application of the CoT technique may boost correct response rates in some models and increase hallucination rates.In contrast,the RAG strategy can improve the correctness of the LLMs and decreased spurious outputs.

关 键 词:大语言模型 前牙美学修复 召回率 幻觉率 思维链 检索增强生成 

分 类 号:R783[医药卫生—口腔医学] TP391.73[医药卫生—临床医学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象