检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵志枭 胡蝶 刘畅[1,2] 沈思 王东波[1,2] Zhao Zhixiao;Hu Die;Liu Chang;Shen Si;Wang Dongbo(College of Information Management,Nanjing Agricultural University,Nanjing 210095;Research Center of Humanities and Social Computing,Nanjing Agricultural University,Nanjing 210095;School of Economics and Management,Nanjing University of Science and Technology,Nanjing 210094)
机构地区:[1]南京农业大学信息管理学院,南京210095 [2]南京农业大学人文与社会计算研究中心,南京210095 [3]南京理工大学经济管理学院,南京210094
出 处:《图书情报工作》2024年第13期132-143,共12页Library and Information Service
基 金:江苏省社科基金后期资助项目“人文社会科学大语言模型构建及应用研究”(项目编号:23HQBO63)研究成果之一。
摘 要:[目的/意义]以人文社科领域为出发点,从人文社科领域基础知识与人文社科学术文本两个方面入手进行人文社科领域模型性能比对。旨在为人文社科领域提供一份体系化的大模型评测基准,供人文社科相关领域研究人员参考。[方法/过程]设计7个人文社科领域相关的评测任务并选取对应指标,在此基础上,选取当前开源且性能较优的通用领域中文大模型,通过调用本地模型以问答形式完成领域化任务,并选取相关指标对其在人文社科领域的性能进行量化评测。[结果/结论]评测结果表明,在选取的开源模型中,无论是基座模型还是对话模型,Qwen性能最优、Baichuan2紧随其后、InternLM次之、Atom表现最差,此外,大多数情况下,相较于基座模型,对话模型表现出更加优越的性能。[Purpose/Significance]This paper Starting from the field of humanities and social sciences,this paper compares the model performance of humanities and social sciences from the aspects of basic knowledge and academic texts.It aims to provide a systematic large language model evaluation benchmark for the humanities and social sciences,and the reference for researchers in related fields.[Method/Process]Seven evaluation tasks related to the field of humanities and social sciences were designed and corresponding indicators were selected.On this basis,the current open-source and high-performance general-purpose domain Chinese large language models were selected to complete the domain-specific tasks in the form of questions and answers by invoking the local models,and their performance in humanities and social sciences was quantitatively evaluated by selecting relevant indicators.[Result/Conclusion]The evaluation results show that among the open-source models selected in this paper,Qwen has the best performance,followed by Baichuan2,InternLM,and Atom has the worst in both the base model and the dialog model.Moreover,in most cases,the dialog model shows more superior performance compared to the base model.
分 类 号:C1[社会学] TP18[自动化与计算机技术—控制理论与控制工程] TP391.1[自动化与计算机技术—控制科学与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.12.111.193