基于大语言模型的《四库全书》自动分类研究  被引量:1

A Study on Automatic Categorization of the Siku Quanshu Based on a Large Language Model

在线阅读下载全文

作  者:左亮 赵志枭 王东波[3] Zuo Liang;Zhao Zhixiao;Wang Dongbo(Digital Humanities Research Center,Nanjing Agricultural University,Nanjing 210095;School of Sociology and Population Studies,School of Social Work,Nanjing University of Posts and Telecommunications,Nanjing 210023;School of Information Management,Nanjing Agricultural University,Nanjing,210095)

机构地区:[1]南京农业大学数字人文研究中心,南京210095 [2]南京邮电大学社会与人口学院、社会工作学院,南京210023 [3]南京农业大学信息管理学院,南京210095

出  处:《信息资源管理学报》2024年第5期23-35,共13页Journal of Information Resources Management

基  金:国家社会科学基金重大项目“中国古代典籍跨语言知识库构建及应用研究”(21&ZD331)的研究成果之一。

摘  要:在古籍研究掀起热潮以及古籍活化成为时代要求的背景下,古籍自动分类面临更高的要求。结合当下前沿的大语言模型,以《四库全书》史部和经部的25类语料作为输入语料,探究荀子古籍大语言系列模型在古籍自动分类上的分类效果。通过与其基座模型对比实验表明,荀子古籍大语言系列模型在古籍自动分类任务中具有明显优势,其中Xunzi-Baichuan2-7B大语言模型的优势最为显著,整体分类F1值达到96.90%;调整训练数据规模的实验表明,荀子古籍大语言模型仅需少量的数据就能够达到与基座模型相当的分类效果。因此,本研究提出的基于荀子古籍大语言模型的古籍自动分类模型,能够实现对古籍的高效细粒度分类,并为资源受限情境下的古籍分类开辟了新途径。The craze of ancient book research and the contemporary requirement of ancient book revitalisation have raised higher requirements for automatic classification of ancient books.This study explores the classification effect of Xunzi large language series models on the automatic classification of ancient books by combining the large language model along the current preface with the 25 categories of corpus from the history and scripture sections of the Siku Quanshu as the input corpus.Through the comparison experiments with its base model,the results show that Xunzi large language models for ancient books have obvious advantages in the automatic classification task of ancient books,among which the Xunzi-Baichuan2-7B large language model has the most significant advantage in the automatic classification task of ancient books,and the overall classification F1 value reaches 96.90%.In addition,the experiments of adjusting the training data size show that the Xunzi-Baichuan2-7B large language model is able to achieve comparable classification results with the base model with only a small amount of data.Therefore,the automatic classification model for ancient books based on Xunzi large language models for ancient books proposed in this study can achieve efficient fine-grained classification of ancient books and opens up a new way for the classification of ancient books in resource-constrained contexts.

关 键 词:《四库全书》 分类模型 荀子古籍大语言模型 文本自动分类 

分 类 号:G256[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象