检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:沈立力[1] 姜鹏 王静[1] Shen Lili;Jiang Peng;Wang Jing(shanghai Library)
机构地区:[1]上海图书馆(上海科学技术情报研究所),上海200031
出 处:《图书馆杂志》2022年第5期109-118,135,共11页Library Journal
基 金:上海图书馆青年扬帆计划专项“基于深度学习的文献数字资源智能分类标引研究与应用”的研究成果之一。
摘 要:Google AI团队发布的BERT模型在多项自然语言处理任务中取得了研究成果,但在中文文献自动分类领域尚有待探索。本文旨在探索BERT;中文基础模型在中文社科、科技期刊文献分类上的实际分类效果,指出模型在实际应用中存在的问题并提出解决方法。本文选取R大类(医药、卫生)、TG大类(金属学与金属工艺)、F大类(经济)、J大类(艺术)共1 745 000条数据作为训练语料,并以另外9 610条数据作为测试样本,利用BERT模型分别对社科、科技期刊文献进行分类研究。测试结果表明BERT模型在社科文献中的四级准确率为76.95%,科技文献为68.55%。之后引入惩罚策略,为实际工作中免检数据阈值的设定提供参考。BERT;模型在《全国报刊索引》实际分类标引工作中有一定可行性,基本满足当前网络环境下中文文献自动分类的需求。The BERT model released by Google AI team has achieved results in a number of Natural Language Processing tasks.But the research in the field of automatic classification of Chinese literature remains to be explored.The purpose of this paper is to explore the actual classification effect of BERT’s Chinese basic model in the classification of Chinese social science and sci-tech periodicals,to point out the problems existing in the practical application of the model,and to propose solutions.This paper selects more than 1 745 000 Chinese documents of R category (medicine,health),TG category (metallogy and metalworking),F category (economics),and J category (art) as training corpus,and uses another 9 610 data as test samples.BERT Model is used to classify the literatures of social science and sci-tech periodicals.The results show that the four-level accuracy of BERT model is 76.95% in social science literature and 68.55% in scientific literature.Then the penalty strategy is introduced to provide reference for the threshold setting of the exemption data in practice.The BERT model can be used in the actual classification and indexing of the Quan Guo Bao Kan Suo Yin (CNBKSY) to meet the needs of automatic classification of Chinese documents under the current network environment.
关 键 词:BERT模型 深度学习 文献分类 《中国图书馆分类法》
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117