大语言模型在分类标引工作中的应用探索  被引量:2

Exploration and Practice of Classification Indexing Combined with Large Language Models

在线阅读下载全文

作  者:姜鹏 任龑 朱蓓琳 JIANG Peng;REN Yan;ZHU Beiling(Shanghai Library,Shanghai 200030)

机构地区:[1]上海图书馆,上海200030

出  处:《农业图书情报学报》2024年第5期32-42,共11页Journal of Library and Information Science in Agriculture

基  金:上海图书馆“2151工程”项目“AIGC服务辅助文献标引的适用性评价”。

摘  要:[目的 /意义]文献分类标引是图书馆等信息机构基础工作之一,目前有限的人工难以类分数量庞大的文献。大语言模型以优异的自然语言理解和处理能力,被用于完成诸如文本生成、自动摘要、文本分类等相关自然语言任务,能够与文献标引全过程相结合,有助于缓解分类标引压力。[方法 /过程]结合《全国报刊索引》长期工作实践,从减轻标引人员阅读压力、大语言模型直接用于分类以及和自动标引模型相结合为切口,探索如何将大语言模型引入分类标引工作环节,以提高标引效率。[结果 /结论]通过一系列对比测试和分析,设计Prompt辅助主题分类模型以及ACBKSY自动标引模型。Prompt辅助主题分类模型标引人员快速了解文献重点,减少阅读压力。ACBKSY模型整体分类准确率提高了2.16%,非拒绝准确率提高了3.77%。在此基础上优化实际标引工作流程,目前此流程已在R、F大类文献标引中投入使用,经优化后的工作流程可以提高标引效率1.1~1.4倍。[Purpose/Significance]Document classification is one of the fundamental tasks of information service institutions such as libraries.The limited human resources make it challenging to categorize the vast number of documents,and the current automatic indexing technologies are not yet fully integrated into the entire indexing process.Large language models(LLMs),with their excellent capabilities in natural language understanding and processing capabilities,have been utilized for various natural language processing tasks such as text generation,automatic summarization,and text classification,which can be integrated into the entire classification process.[Method/Process]Combining the long-term practical experience of the National Newspaper Index,the research on how to introduce LLMs into the classification and indexing process is conducted from three aspects:reducing the reading pressure on indexers,directly using LLMs for classification,and combining them with automatic indexing models.A prompt-assisted topic classification model is designed to leverage the LLM for intelligent analysis and extraction of document content,guiding the model to output concise information summaries.This allows indexers to quickly understand the basic situation of the research,grasp the essence of key concepts and their interrelationships,and thus quickly and accurately determine how to classify the collections.[Results/Conclusions]When the LLM cannot be directly used for text classification tasks based on the"Chinese Library Classification"(CLC),it is combined with existing automatic models to generate the ACBKSY model.The overall classification accuracy of the model has improved by 2.16%,and the non-rejection accuracy has increased by 3.77%.On this basis,the actual indexing workflow is optimized to increase the systematicity and coherence of the indexing work,ensuring that every step from document input to final classification is more efficient and accurate.This optimized workflow has been put into use in the R and F categories of the col

关 键 词:分类标引 大语言模型 文心一言 GPT-4 

分 类 号:G250.7[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象