检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张芊 陈攀峰 冯林坤 刘淑钰 马丹 陈梅[1,2] 李晖 ZHANG Qian;CHEN Panfeng;FENG Linkun;LIU Shuyu;MA Dan;CHEN Mei;LI Hui(State Key Laboratory of Public Big Data,Guiyang 550000,China;College of Computer Science and Technology,Guizhou University,Guiyang 550000,China)
机构地区:[1]公共大数据国家重点实验室,贵州贵阳550000 [2]贵州大学计算机科学与技术学院,贵州贵阳550000
出 处:《大数据》2024年第5期28-44,共17页Big Data Research
基 金:国家自然科学基金项目(No.61462012);2023年贵州省科技计划项目(黔科合支撑[2023]一般276);2023年贵州省科技成果应用及产业化计划项目(黔科合成果[2023]一般010)。
摘 要:大语言模型在医疗领域显现出巨大的应用潜力,如何评估其在医疗领域中的性能成为挑战。现有医疗评测基准测试多为选择题形式,难以全面和精准地评估模型在儿科医疗场景中的性能。为此,提出首个中文儿科医疗问答基准测试方法——PeMeBench。该方法基于双视角评估维度,参考来自10个儿科疾病系统的诊疗规范类书籍,将儿科医疗问答任务细分为疾病知识、治疗方案、用药剂量、疾病预防和药理作用5个儿科医疗问答子任务,构建超1万个开放式的问答题目,引入一种融合实体召回和检测语句幻觉的多粒度自动化评估方案,旨在对大语言模型在儿科基础医疗领域中的性能进行全面、准确的评估,深入剖析其潜在局限性,为提升医疗服务的智能化水平奠定坚实的基础。Large language model(LLM)has demonstrated significant application potential in the medical field.However,evaluating the performance of LLM in medical scenarios poses a challenge.Existing medical benchmarks,predominantly in the form of multiple-choice questions,struggle to comprehensively and accurately assess LLM's performance in pediatric domains.To address this issue,PeMeBench,the first Chinese pediatric question-answering benchmark,was proposed.Leveraging a dual-perspective evaluation dimensions and referencing diagnostic and treatment guidelines from 10 pediatric disease systems,PeMeBench meticulously categorized pediatric medical question-answering tasks into five subdomains:disease knowledge,treatment plans,medication dosages,disease prevention,and pharmacological effects.It comprised over 10000 open-ended question-answering items and introduced a multi-grained automated evaluation scheme that integrated entity retrieval with the detection of hallucinated sentences.This approach aimed to provide a comprehensive and precise assessment of LLM's performance in pediatric healthcare,delving into their potential limitations and laying a solid foundation for enhancing the intelligence level of medical services.
分 类 号:TP399[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.220.9.72