检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:左亮 赵志枭 王东波[3] Zuo Liang;Zhao Zhixiao;Wang Dongbo(Digital Humanities Research Center,Nanjing Agricultural University,Nanjing 210095;School of Sociology and Population Studies,School of Social Work,Nanjing University of Posts and Telecommunications,Nanjing 210023;School of Information Management,Nanjing Agricultural University,Nanjing,210095)
机构地区:[1]南京农业大学数字人文研究中心,南京210095 [2]南京邮电大学社会与人口学院、社会工作学院,南京210023 [3]南京农业大学信息管理学院,南京210095
出 处:《信息资源管理学报》2024年第5期23-35,共13页Journal of Information Resources Management
基 金:国家社会科学基金重大项目“中国古代典籍跨语言知识库构建及应用研究”(21&ZD331)的研究成果之一。
摘 要:在古籍研究掀起热潮以及古籍活化成为时代要求的背景下,古籍自动分类面临更高的要求。结合当下前沿的大语言模型,以《四库全书》史部和经部的25类语料作为输入语料,探究荀子古籍大语言系列模型在古籍自动分类上的分类效果。通过与其基座模型对比实验表明,荀子古籍大语言系列模型在古籍自动分类任务中具有明显优势,其中Xunzi-Baichuan2-7B大语言模型的优势最为显著,整体分类F1值达到96.90%;调整训练数据规模的实验表明,荀子古籍大语言模型仅需少量的数据就能够达到与基座模型相当的分类效果。因此,本研究提出的基于荀子古籍大语言模型的古籍自动分类模型,能够实现对古籍的高效细粒度分类,并为资源受限情境下的古籍分类开辟了新途径。The craze of ancient book research and the contemporary requirement of ancient book revitalisation have raised higher requirements for automatic classification of ancient books.This study explores the classification effect of Xunzi large language series models on the automatic classification of ancient books by combining the large language model along the current preface with the 25 categories of corpus from the history and scripture sections of the Siku Quanshu as the input corpus.Through the comparison experiments with its base model,the results show that Xunzi large language models for ancient books have obvious advantages in the automatic classification task of ancient books,among which the Xunzi-Baichuan2-7B large language model has the most significant advantage in the automatic classification task of ancient books,and the overall classification F1 value reaches 96.90%.In addition,the experiments of adjusting the training data size show that the Xunzi-Baichuan2-7B large language model is able to achieve comparable classification results with the base model with only a small amount of data.Therefore,the automatic classification model for ancient books based on Xunzi large language models for ancient books proposed in this study can achieve efficient fine-grained classification of ancient books and opens up a new way for the classification of ancient books in resource-constrained contexts.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49