基于大语言模型的人文社会科学学术论文学科分类研究  

Discipline Classification of Humanities and Social Sciences Academic Papers Based on Large Language Models

在线阅读下载全文

作  者:胡蝶 林立涛 刘浏 沈思 王东波[1,2] Hu Die;Lin Litao;Liu Liu;Shen Si;Wang Dongbo(College of Information Management, Nanjing Agricultural University;Research Center for Humanities and Social Computing, Nanjing Agricultural University;School of Information Management, Nanjing University;School of Economics and Management, Nanjing University of Science and Technology)

机构地区:[1]南京农业大学信息管理学院,江苏南京210095 [2]南京农业大学人文与社会计算研究中心 [3]南京大学信息管理学院,江苏南京210023 [4]南京理工大学经济管理学院

出  处:《图书馆杂志》2025年第4期110-122,共13页Library Journal

摘  要:学术论文的快速增长与学科领域的细分化程度提高对学术文献自动分类提出了更高的要求。为探究大语言模型在学术论文学科分类上的适用性,本文以人文社会科学领域论文为例,基于代表性传统机器学习模型与大语言模型Qwen-7B、Llama2-7B、Llama2-7B-hsse及GPT4开展学科分类实验,对比分析不同模型性能。在此基础上,探究大语言模型在不同规模精调数据上的分类表现。研究表明,基于领域大语言模型Llama2-7B-hsse构建的人文社科论文分类器以89.22%的整体分类F1值在21分类的对比实验中展现出明显优势,仅需五分之一的数据就能够达到与人文社科领域预训练模型SsciBERT相当的分类效果。基于大语言模型的领域增量训练与微调策略能够有效支持数据资源受限情景下的学术论文自动分类,同时为知识组织、学科交叉分析提供新思路。The rapid growth of academic papers and the increasing degree of specialization in disciplinary fields pose higher demands on automatic classification methods.This paper investigates the applicability of large language models(LLMs)in classifying academic papers in the humanities and social sciences.It compares the performance of traditional machine learning models and LLMs(including Qwen-7B,Llama2-7B,Llama2-7B-hsse,and GPT4),through subject classification experiments.It further explores the performance of LLMs across different scales of labeled data.The study shows that the domain-specific large language model Llama2-7B-hsse exhibits a significant advantage with an overall classification F1-score of 89.22%across 21 categories,while requiring only one-fifth of the training data needed by the benchmark model SsciBERT.The findings highlight the effectiveness of domain incremental training and fine-tuning strategies based on large language models for enhancing automatic classification,especially in resource-limited scenarios,while providing new ideas for knowledge organization and interdisciplinary research.

关 键 词:文献学科分类 文献标引 大语言模型 

分 类 号:G254.3[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象